"deploy/vscode:/vscode.git/clone" did not exist on "7bbacce196c9058c522627e4f210286d6ebb7472"
Unverified Commit b2c59aa4 authored by Yan Ru Pei's avatar Yan Ru Pei Committed by GitHub
Browse files

feat(replay): add shared loadgen workload paths [DYN-2510] (#7593)


Signed-off-by: default avatarPeaBrane <yanrpei@gmail.com>
parent 2b36b175
...@@ -7,8 +7,8 @@ subtitle: Replay Mooncake-style traces through the mocker in offline or online m ...@@ -7,8 +7,8 @@ subtitle: Replay Mooncake-style traces through the mocker in offline or online m
This guide covers trace replay support for Mooncake-style JSONL traces via `python -m dynamo.replay`, This guide covers trace replay support for Mooncake-style JSONL traces via `python -m dynamo.replay`,
which prints an AIPerf-style summary table, writes the full replay report JSON to disk, and exposes which prints an AIPerf-style summary table, writes the full replay report JSON to disk, and exposes
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and synthetic replay inputs `offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, closed-loop concurrency, and
directly. synthetic workload inputs directly.
Unlike normal `dynamo.mocker` usage, offline replay does not launch workers, register endpoints, or Unlike normal `dynamo.mocker` usage, offline replay does not launch workers, register endpoints, or
require NATS, etcd, or a frontend. Online replay does exercise the live mock-worker runtime path. require NATS, etcd, or a frontend. Online replay does exercise the live mock-worker runtime path.
...@@ -47,6 +47,24 @@ python -m dynamo.replay \ ...@@ -47,6 +47,24 @@ python -m dynamo.replay \
--report-json /tmp/replay-report.json --report-json /tmp/replay-report.json
``` ```
Run synthetic workload replay when you want shared-prefix or multi-turn structure without a trace
file:
```bash
python -m dynamo.replay \
--input-tokens 5000 \
--output-tokens 500 \
--request-count 200 \
--turns-per-session 3 \
--shared-prefix-ratio 0.5 \
--num-prefix-groups 8 \
--inter-turn-delay-ms 250 \
--replay-mode offline \
--replay-concurrency 32 \
--extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json
```
`python -m dynamo.replay` prints an AIPerf-style summary table to stdout and writes the full replay `python -m dynamo.replay` prints an AIPerf-style summary table to stdout and writes the full replay
report JSON to disk. report JSON to disk.
...@@ -65,12 +83,29 @@ Example: ...@@ -65,12 +83,29 @@ Example:
{"timestamp": 0, "input_length": 6755, "output_length": 500, "hash_ids": [0, 1, 2, 3]} {"timestamp": 0, "input_length": 6755, "output_length": 500, "hash_ids": [0, 1, 2, 3]}
``` ```
The mocker synthesizes token blocks from `hash_ids` using the configured `--block-size`, so the Replay also supports multi-turn sessions. Use the same `session_id` on all turns in a session. The
first turn uses `timestamp` or `created_time`; later turns may use either:
- `delay` or `delay_ms` directly
- or an absolute later `timestamp`, in which case replay infers the inter-turn delay from the
previous turn timestamp
Example:
```json
{"session_id":"session-a","timestamp":1000,"input_length":2048,"output_length":128,"hash_ids":[1,2,3,4]}
{"session_id":"session-a","delay":250,"input_length":2560,"output_length":128,"hash_ids":[1,2,3,4,5]}
{"session_id":"session-b","timestamp":1010,"input_length":1024,"output_length":64,"hash_ids":[9,10]}
{"session_id":"session-b","delay_ms":50,"input_length":1536,"output_length":64,"hash_ids":[9,10,11]}
```
The mocker synthesizes token blocks from `hash_ids` using the configured mocker `block_size`, so the
replay block size must match the block size used when the trace was generated. Public Mooncake replay block size must match the block size used when the trace was generated. Public Mooncake
traces are commonly block-level hashes at `512` tokens per hash ID, so replaying them with the traces are commonly block-level hashes at `512` tokens per hash ID, so replaying them with the
default mocker `block_size=64` will fail once `input_length > len(hash_ids) * 64`. For default mocker `block_size=64` will fail once `input_length > len(hash_ids) * 64`. Set that
`engine_type=sglang`, replay still uses canonical `block_size` internally; `sglang.page_size` is through `--extra-engine-args '{"block_size":512}'`. For `engine_type=sglang`, replay still uses
accepted as a compatibility alias and is normalized into `block_size` before replay starts. canonical `block_size` internally; `sglang.page_size` is accepted as a compatibility alias and is
normalized into `block_size` before replay starts.
## Replay Surfaces ## Replay Surfaces
...@@ -85,10 +120,19 @@ The dedicated replay CLI exposes: ...@@ -85,10 +120,19 @@ The dedicated replay CLI exposes:
- `--replay-concurrency` - `--replay-concurrency`
- `--arrival-interval-ms` - `--arrival-interval-ms`
- `--arrival-speedup-ratio` - `--arrival-speedup-ratio`
- `--turns-per-session`
- `--shared-prefix-ratio`
- `--num-prefix-groups`
- `--inter-turn-delay-ms`
- `--extra-engine-args` (JSON string) - `--extra-engine-args` (JSON string)
- `--router-config` (JSON string) - `--router-config` (JSON string)
- `--report-json` - `--report-json`
Defaults:
- `--replay-mode offline`
- `--router-mode round_robin`
Example: Example:
```bash ```bash
...@@ -115,9 +159,10 @@ SGLang replay uses the same CLI surface. A minimal extra-engine-args file can us ...@@ -115,9 +159,10 @@ SGLang replay uses the same CLI surface. A minimal extra-engine-args file can us
} }
``` ```
Both `--extra-engine-args` and `--router-config` accept partial JSON objects. Unspecified fields Both `--extra-engine-args` and `--router-config` accept partial JSON objects. Engine settings such
fall back to the same defaults used by `MockEngineArgs::default()` and as `block_size`, `engine_type`, `dp_size`, `speedup_ratio`, and `decode_speedup_ratio` belong in
`KvRouterConfig::default()`. `--extra-engine-args`, not as top-level replay CLI flags. Unspecified fields fall back to the same
defaults used by `MockEngineArgs::default()` and `KvRouterConfig::default()`.
### Synthetic Replay ### Synthetic Replay
...@@ -137,6 +182,19 @@ python -m dynamo.replay \ ...@@ -137,6 +182,19 @@ python -m dynamo.replay \
This is useful for parameter sweeps where Mooncake-style prefix structure is not required. This is useful for parameter sweeps where Mooncake-style prefix structure is not required.
When `--turns-per-session > 1`, `--request-count` is interpreted as the number of sessions rather
than the total number of emitted turns. The total completed request count becomes:
- `request_count * turns_per_session`
Synthetic workload options:
- `--turns-per-session`: number of turns in each synthetic session
- `--shared-prefix-ratio`: fraction of prompt blocks shared inside a prefix group
- `--num-prefix-groups`: number of shared-prefix groups; `0` disables grouping
- `--inter-turn-delay-ms`: constant delay applied after each completed turn before the next turn in
the same session becomes eligible
## Modes ## Modes
### Fixed-Schedule Replay ### Fixed-Schedule Replay
...@@ -155,8 +213,8 @@ This is the right mode when you want deterministic replay of the original arriva ...@@ -155,8 +213,8 @@ This is the right mode when you want deterministic replay of the original arriva
### Closed-Loop Concurrency Replay ### Closed-Loop Concurrency Replay
Use `--replay-concurrency` to ignore trace arrival timing and keep a fixed number of requests in Use `--replay-concurrency` to ignore first-turn trace arrival timing and keep a fixed number of
flight: requests in flight:
```bash ```bash
python -m dynamo.replay /path/to/mooncake_trace.jsonl \ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
...@@ -167,6 +225,13 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \ ...@@ -167,6 +225,13 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
This mode is useful when you want to compare scheduler behavior under a fixed offered concurrency rather than the original trace schedule. This mode is useful when you want to compare scheduler behavior under a fixed offered concurrency rather than the original trace schedule.
For multi-turn sessions, concurrency mode still enforces session order and inter-turn delays:
- first-turn timestamps are ignored
- turn `n+1` is not eligible until turn `n` completes
- `delay` / `delay_ms` / synthetic `--inter-turn-delay-ms` are still applied after completion
- TTFT is measured from actual dispatch under the cap, not from the ignored trace timestamp
### Online Replay ### Online Replay
Online replay launches the mock workers and replays the trace against the live runtime path. This Online replay launches the mock workers and replays the trace against the live runtime path. This
...@@ -256,14 +321,15 @@ If `--report-json` is not provided, `python -m dynamo.replay` writes a timestamp ...@@ -256,14 +321,15 @@ If `--report-json` is not provided, `python -m dynamo.replay` writes a timestamp
Shared replay constraints: Shared replay constraints:
- aggregated mode - aggregated mode
- `--engine-type vllm|sglang` - `extra_engine_args.engine_type` must be `vllm` or `sglang`
- `--data-parallel-size 1` - `extra_engine_args.dp_size` must be `1`
Additional offline constraints: Additional offline constraints:
- offline `kv_router` requires `num_workers > 1` - offline `kv_router` requires `num_workers > 1`
- public single-worker offline replay still uses the legacy single-worker runtime for `vllm` - single-worker offline replay is still a dedicated fast path for `vllm`, but it now supports both
while `sglang` goes through the shared multi-worker replay runtime even when `num_workers=1` flat request replay and workload-driven multi-turn replay
- `sglang` still goes through the shared multi-worker replay runtime even when `num_workers=1`
Additional online constraints: Additional online constraints:
...@@ -276,9 +342,12 @@ If you violate those constraints, replay fails immediately with a validation err ...@@ -276,9 +342,12 @@ If you violate those constraints, replay fails immediately with a validation err
- `python -m dynamo.replay` requires exactly one of: - `python -m dynamo.replay` requires exactly one of:
either a trace file, or all of `--input-tokens`, `--output-tokens`, and `--request-count` either a trace file, or all of `--input-tokens`, `--output-tokens`, and `--request-count`
- `--replay-concurrency` works with both trace replay and synthetic replay - `--replay-concurrency` works with both trace replay and synthetic replay
- `--speedup-ratio` still affects simulated timing - mocker compute-speed knobs such as `speedup_ratio` still affect simulated timing when passed via
`--extra-engine-args`
- `--arrival-speedup-ratio` affects trace timestamps, not worker compute speed - `--arrival-speedup-ratio` affects trace timestamps, not worker compute speed
- `--arrival-interval-ms` only applies to synthetic replay - `--arrival-interval-ms` only applies to synthetic replay
- `--turns-per-session`, `--shared-prefix-ratio`, `--num-prefix-groups`, and
`--inter-turn-delay-ms` only apply to synthetic replay
- `--extra-engine-args` and `--router-config` are JSON strings on the standalone replay CLI - `--extra-engine-args` and `--router-config` are JSON strings on the standalone replay CLI
- offline replay does not need planner runtime setup, router registration, or external event transport - offline replay does not need planner runtime setup, router registration, or external event transport
- the replay block size should match the trace block size, because token synthesis expands `hash_ids` - the replay block size should match the trace block size, because token synthesis expands `hash_ids`
......
...@@ -125,8 +125,11 @@ python -m dynamo.mocker \ ...@@ -125,8 +125,11 @@ python -m dynamo.mocker \
## Trace Replay ## Trace Replay
The mocker supports replaying Mooncake-style traces through the dedicated replay CLI, which exposes The mocker supports replaying Mooncake-style traces through the dedicated replay CLI, which exposes
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and the synthetic replay path `offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, closed-loop concurrency
directly: admission, and synthetic workload generation directly:
The replay CLI defaults to `--replay-mode offline` and `--router-mode round_robin`. Engine settings
such as `block_size`, `engine_type`, and compute speedups still belong in `--extra-engine-args`.
```bash ```bash
python -m dynamo.replay /path/to/mooncake_trace.jsonl \ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
...@@ -154,9 +157,40 @@ python -m dynamo.replay \ ...@@ -154,9 +157,40 @@ python -m dynamo.replay \
--report-json /tmp/replay-report.json --report-json /tmp/replay-report.json
``` ```
Synthetic replay also supports workload-style generation for shared-prefix and multi-turn tests:
```bash
python -m dynamo.replay \
--input-tokens 5000 \
--output-tokens 500 \
--request-count 200 \
--turns-per-session 3 \
--shared-prefix-ratio 0.5 \
--num-prefix-groups 8 \
--inter-turn-delay-ms 250 \
--replay-mode offline \
--replay-concurrency 32 \
--extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json
```
For trace files, replay also understands multi-turn sessions when records share `session_id`. The
first turn uses `timestamp`/`created_time`; later turns can use `delay` or `delay_ms`:
```json
{"session_id":"session-a","timestamp":1000,"input_length":2048,"output_length":128,"hash_ids":[1,2,3,4]}
{"session_id":"session-a","delay":250,"input_length":2560,"output_length":128,"hash_ids":[1,2,3,4,5]}
```
The standalone replay CLI prints an AIPerf-style summary table to stdout and writes the full replay The standalone replay CLI prints an AIPerf-style summary table to stdout and writes the full replay
report JSON to disk. report JSON to disk.
Timing semantics:
- trace mode honors first-turn timestamps and inter-turn delays
- concurrency mode ignores first-turn timestamps but still enforces inter-turn delays
- in concurrency mode, TTFT is measured from actual dispatch under the in-flight cap
For full usage, constraints, and benchmarking guidance, see [Mocker Trace Replay](../benchmarks/mocker-trace-replay.md). For full usage, constraints, and benchmarking guidance, see [Mocker Trace Replay](../benchmarks/mocker-trace-replay.md).
Replay supports aggregated `vllm` and `sglang` engine configs. Internally replay uses canonical Replay supports aggregated `vllm` and `sglang` engine configs. Internally replay uses canonical
......
...@@ -40,11 +40,11 @@ reqwest = { workspace = true } ...@@ -40,11 +40,11 @@ reqwest = { workspace = true }
serde = { workspace = true } serde = { workspace = true }
serde_json = { workspace = true } serde_json = { workspace = true }
tokio = { workspace = true } tokio = { workspace = true }
dynamo-mocker = { workspace = true }
[dev-dependencies] [dev-dependencies]
async-trait = { workspace = true } async-trait = { workspace = true }
dynamo-kv-router = { workspace = true, features = ["bench"] } dynamo-kv-router = { workspace = true, features = ["bench"] }
dynamo-mocker = { workspace = true }
dynamo-tokens = { workspace = true } dynamo-tokens = { workspace = true }
minstant = "0.1.7" minstant = "0.1.7"
plotters = { version = "0.3", default-features = false, features = ["svg_backend", "line_series", "point_series", "full_palette"] } plotters = { version = "0.3", default-features = false, features = ["svg_backend", "line_series", "point_series", "full_palette"] }
......
...@@ -9,16 +9,11 @@ use clap::Parser; ...@@ -9,16 +9,11 @@ use clap::Parser;
use common::NoopSequencePublisher; use common::NoopSequencePublisher;
use dynamo_kv_router::protocols::WorkerWithDpRank; use dynamo_kv_router::protocols::WorkerWithDpRank;
use dynamo_kv_router::{ActiveSequencesMultiWorker, OverlapScores, SequenceRequest}; use dynamo_kv_router::{ActiveSequencesMultiWorker, OverlapScores, SequenceRequest};
use dynamo_mocker::common::protocols::{DirectRequest, KvEventPublishers, OutputSignal}; use dynamo_mocker::loadgen::Trace;
use dynamo_mocker::scheduler::Scheduler;
use dynamo_mocker::scheduler::SchedulerHandle;
use dynamo_tokens::SequenceHash; use dynamo_tokens::SequenceHash;
use std::collections::HashMap; use std::collections::HashMap;
use std::sync::Arc; use std::sync::Arc;
use tokio::sync::mpsc;
use tokio::task::JoinHandle;
use tokio::time::{Duration, Instant}; use tokio::time::{Duration, Instant};
use uuid::Uuid;
#[derive(Parser, Debug)] #[derive(Parser, Debug)]
#[clap( #[clap(
...@@ -76,69 +71,46 @@ struct SequenceTrace { ...@@ -76,69 +71,46 @@ struct SequenceTrace {
/// completed=true → Free /// completed=true → Free
/// 4. Collect timestamps for later replay /// 4. Collect timestamps for later replay
async fn generate_sequence_events( async fn generate_sequence_events(
traces: &[Vec<MooncakeRequest>], traces: &[Trace],
num_gpu_blocks: usize, num_gpu_blocks: usize,
block_size: u32, block_size: u32,
trace_simulation_duration_ms: u64, trace_simulation_duration_ms: u64,
) -> anyhow::Result<Vec<Vec<SequenceTrace>>> { ) -> anyhow::Result<Vec<Vec<SequenceTrace>>> {
println!("Generating sequence events..."); println!("Generating sequence events...");
let sched_args = default_mock_engine_args(num_gpu_blocks, block_size as usize)?; let artifacts = generate_replay_artifacts(
traces,
let scaled_traces: Vec<_> = traces num_gpu_blocks,
.iter() block_size,
.map(|worker_trace| scale_mooncake_trace(worker_trace, trace_simulation_duration_ms)) trace_simulation_duration_ms,
.collect(); )
.await?;
let progress = make_progress_bar(Some(traces.iter().map(|w| w.len() as u64).sum::<u64>())); let mut all_traces = Vec::with_capacity(artifacts.len());
let mut tasks: Vec<JoinHandle<anyhow::Result<Vec<SequenceTrace>>>> = Vec::new();
for worker_trace in scaled_traces {
let sched_args = sched_args.clone();
let progress = progress.clone();
tasks.push(tokio::spawn(async move {
let (output_tx, mut output_rx) = mpsc::unbounded_channel::<OutputSignal>();
// No KvCacheEventSink — we only need output signals
let scheduler = Scheduler::new(
sched_args,
0,
Some(output_tx),
KvEventPublishers::default(),
None,
);
// Pre-compute metadata for each request before submission for artifact in artifacts {
let mut metadata: HashMap<Uuid, RequestMetadata> = HashMap::new(); let metadata = artifact
for req in &worker_trace { .requests
let block_hashes: Vec<SequenceHash> = req
.hash_ids
.iter() .iter()
.map(|&id| local_block_hash_from_id(id, block_size).0) .map(|request| {
.collect(); (
let isl = req.hash_ids.len() * block_size as usize; request.uuid,
metadata.insert(
req.uuid,
RequestMetadata { RequestMetadata {
block_hashes, block_hashes: request.replay_hashes.sequence_hashes.clone(),
isl, isl: request.input_length,
output_length: req.output_length, output_length: request.output_length as u64,
}, },
); )
} })
.collect::<HashMap<_, _>>();
// Spawn drain task that converts OutputSignals → SequenceTrace entries
let drain_handle: JoinHandle<Vec<SequenceTrace>> = tokio::spawn(async move {
let mut entries = Vec::new(); let mut entries = Vec::new();
let mut seen: HashMap<Uuid, bool> = HashMap::new(); let mut seen = HashMap::new();
while let Some(signal) = output_rx.recv().await { for timed_signal in artifact.output_signals {
let signal = timed_signal.signal;
let request_id = signal.uuid.to_string(); let request_id = signal.uuid.to_string();
if let std::collections::hash_map::Entry::Vacant(e) = seen.entry(signal.uuid) { if let std::collections::hash_map::Entry::Vacant(entry) = seen.entry(signal.uuid) {
e.insert(false); entry.insert(());
if let Some(meta) = metadata.get(&signal.uuid) { if let Some(meta) = metadata.get(&signal.uuid) {
entries.push(SequenceTrace { entries.push(SequenceTrace {
entry: SequenceTraceEntry::Add { entry: SequenceTraceEntry::Add {
...@@ -147,92 +119,26 @@ async fn generate_sequence_events( ...@@ -147,92 +119,26 @@ async fn generate_sequence_events(
isl: meta.isl, isl: meta.isl,
output_length: meta.output_length, output_length: meta.output_length,
}, },
timestamp_us: 0, // rescaled later timestamp_us: timed_signal.timestamp_us,
}); });
entries.push(SequenceTrace { entries.push(SequenceTrace {
entry: SequenceTraceEntry::PrefillComplete { entry: SequenceTraceEntry::PrefillComplete {
request_id: request_id.clone(), request_id: request_id.clone(),
}, },
timestamp_us: 0, timestamp_us: timed_signal.timestamp_us,
}); });
} }
} }
if signal.completed { if signal.completed {
seen.insert(signal.uuid, true);
entries.push(SequenceTrace { entries.push(SequenceTrace {
entry: SequenceTraceEntry::Free { request_id }, entry: SequenceTraceEntry::Free { request_id },
timestamp_us: 0, timestamp_us: timed_signal.timestamp_us,
});
}
}
entries
});
// Submit requests at scaled timing
let mut i = 0;
let mut target = Instant::now();
let start = target;
while i < worker_trace.len() {
let prev_i = i;
scheduler.receive(DirectRequest {
tokens: tokens_from_request(&worker_trace[i], block_size),
max_output_tokens: worker_trace[i].output_length as usize,
uuid: Some(worker_trace[i].uuid),
dp_rank: 0,
arrival_timestamp_ms: None,
});
i += 1;
while i < worker_trace.len()
&& worker_trace[i].timestamp == worker_trace[i - 1].timestamp
{
scheduler.receive(DirectRequest {
tokens: tokens_from_request(&worker_trace[i], block_size),
max_output_tokens: worker_trace[i].output_length as usize,
uuid: Some(worker_trace[i].uuid),
dp_rank: 0,
arrival_timestamp_ms: None,
}); });
i += 1;
}
if i < worker_trace.len() {
target += Duration::from_millis(
worker_trace[i].timestamp - worker_trace[i - 1].timestamp,
);
} }
tokio::time::sleep_until(target).await;
progress.inc((i - prev_i) as u64);
} }
// Drop scheduler → CancelGuard fires → background task exits → all_traces.push(entries);
// output_tx dropped → drain task sees None
drop(scheduler);
let mut entries = drain_handle.await?;
// Assign monotonically increasing timestamps based on entry order
let total_us = (Instant::now() - start).as_micros() as u64;
let num_entries = entries.len() as u64;
for (idx, entry) in entries.iter_mut().enumerate() {
entry.timestamp_us = if num_entries > 1 {
idx as u64 * total_us / (num_entries - 1)
} else {
0
};
}
Ok(entries)
}));
}
let mut all_traces = Vec::new();
for task in tasks {
all_traces.push(task.await??);
} }
let total_adds = all_traces let total_adds = all_traces
...@@ -503,30 +409,44 @@ async fn run_tests() -> anyhow::Result<()> { ...@@ -503,30 +409,44 @@ async fn run_tests() -> anyhow::Result<()> {
)); ));
{ {
let mut f = File::create(&path)?; let mut f = File::create(&path)?;
for (i, (hash_ids, output_length)) in
[(&[0u64, 1, 2] as &[u64], 10u64), (&[0, 1, 3, 4], 10)]
.iter()
.enumerate()
{
writeln!( writeln!(
f, f,
"{}", "{}",
serde_json::json!({ serde_json::json!({
"timestamp": i as u64, "session_id": "session-a",
"hash_ids": hash_ids, "timestamp": 0,
"output_length": output_length, "input_length": 4,
"hash_ids": [0u64, 1, 2, 3],
"output_length": 10u64,
})
)?;
writeln!(
f,
"{}",
serde_json::json!({
"session_id": "session-a",
"delay": 5.0,
"input_length": 4,
"hash_ids": [4u64, 5, 6, 7],
"output_length": 10u64,
}) })
)?; )?;
}
} }
let traces = process_mooncake_trace(path.to_str().unwrap(), 1, 1, 2, 42)?; let traces = process_mooncake_trace(path.to_str().unwrap(), 512, 1, 1, 1, 42)?;
std::fs::remove_file(&path).ok(); std::fs::remove_file(&path).ok();
println!( println!(
"Loaded {} workers, {} total requests", "Loaded {} workers, {} total requests",
traces.len(), traces.len(),
traces.iter().map(|t| t.len()).sum::<usize>() traces
.iter()
.map(|trace| trace
.sessions
.iter()
.map(|session| session.turns.len())
.sum::<usize>())
.sum::<usize>()
); );
let seq_traces = generate_sequence_events(&traces, 1048576, 512, 100).await?; let seq_traces = generate_sequence_events(&traces, 1048576, 512, 100).await?;
...@@ -545,6 +465,29 @@ async fn run_tests() -> anyhow::Result<()> { ...@@ -545,6 +465,29 @@ async fn run_tests() -> anyhow::Result<()> {
assert!(total_adds > 0, "expected at least one Add event"); assert!(total_adds > 0, "expected at least one Add event");
assert!(total_frees > 0, "expected at least one Free event"); assert!(total_frees > 0, "expected at least one Free event");
assert_eq!(total_adds, total_frees, "adds and frees should match"); assert_eq!(total_adds, total_frees, "adds and frees should match");
for trace in &seq_traces {
assert!(
trace
.windows(2)
.all(|window| window[1].timestamp_us >= window[0].timestamp_us)
);
}
let first_free_us = seq_traces[0]
.iter()
.find_map(|entry| match entry.entry {
SequenceTraceEntry::Free { .. } => Some(entry.timestamp_us),
_ => None,
})
.unwrap();
let second_add_us = seq_traces[0]
.iter()
.filter_map(|entry| match entry.entry {
SequenceTraceEntry::Add { .. } => Some(entry.timestamp_us),
_ => None,
})
.nth(1)
.unwrap();
assert!(second_add_us >= first_free_us);
println!("All tests passed."); println!("All tests passed.");
Ok(()) Ok(())
...@@ -567,6 +510,7 @@ async fn main() -> anyhow::Result<()> { ...@@ -567,6 +510,7 @@ async fn main() -> anyhow::Result<()> {
}; };
let traces = process_mooncake_trace( let traces = process_mooncake_trace(
path, path,
args.common.block_size,
args.common.trace_length_factor, args.common.trace_length_factor,
args.common.trace_duplication_factor, args.common.trace_duplication_factor,
args.common.num_unique_inference_workers, args.common.num_unique_inference_workers,
......
...@@ -12,7 +12,11 @@ use dynamo_kv_router::protocols::{ ...@@ -12,7 +12,11 @@ use dynamo_kv_router::protocols::{
}; };
pub use dynamo_kv_router::test_utils::{NoopSequencePublisher, SimpleWorkerConfig}; pub use dynamo_kv_router::test_utils::{NoopSequencePublisher, SimpleWorkerConfig};
use dynamo_mocker::common::protocols::{ use dynamo_mocker::common::protocols::{
DirectRequest, KvCacheEventSink, KvEventPublishers, MockEngineArgs, DirectRequest, KvCacheEventSink, KvEventPublishers, MockEngineArgs, OutputSignal,
};
use dynamo_mocker::loadgen::{
ArrivalSpec, DelaySpec, LengthSpec, ReplayRequestHashes, RouterSequence, SequenceHashMode,
SessionPartitionSpec, SyntheticTraceSpec, Trace,
}; };
use dynamo_mocker::scheduler::Scheduler; use dynamo_mocker::scheduler::Scheduler;
use dynamo_mocker::scheduler::SchedulerHandle; use dynamo_mocker::scheduler::SchedulerHandle;
...@@ -24,6 +28,7 @@ use serde::{Deserialize, Serialize}; ...@@ -24,6 +28,7 @@ use serde::{Deserialize, Serialize};
use std::fs::File; use std::fs::File;
use std::io::{BufRead, BufReader}; use std::io::{BufRead, BufReader};
use std::sync::{Arc, Mutex}; use std::sync::{Arc, Mutex};
use tokio::sync::mpsc;
use tokio::task::JoinHandle; use tokio::task::JoinHandle;
use tokio::time::Instant; use tokio::time::Instant;
use uuid::Uuid; use uuid::Uuid;
...@@ -101,6 +106,8 @@ pub struct MooncakeRequest { ...@@ -101,6 +106,8 @@ pub struct MooncakeRequest {
#[serde(default = "Uuid::new_v4")] #[serde(default = "Uuid::new_v4")]
pub uuid: uuid::Uuid, pub uuid: uuid::Uuid,
pub timestamp: u64, pub timestamp: u64,
#[serde(default)]
pub input_length: usize,
pub hash_ids: Vec<u64>, pub hash_ids: Vec<u64>,
pub output_length: u64, pub output_length: u64,
} }
...@@ -133,6 +140,35 @@ impl KvCacheEventSink for EventCollector { ...@@ -133,6 +140,35 @@ impl KvCacheEventSink for EventCollector {
} }
} }
#[derive(Clone)]
pub struct TimedReplayRequest {
pub uuid: Uuid,
pub timestamp_us: u64,
pub scheduled_ready_at_ms: f64,
pub input_length: usize,
pub output_length: usize,
pub replay_hashes: ReplayRequestHashes,
}
#[derive(Clone)]
pub struct TimedOutputSignal {
pub signal: OutputSignal,
pub timestamp_us: u64,
}
#[derive(Clone)]
pub struct TimedKvEvent {
pub event: KvCacheEvent,
pub timestamp_us: u64,
}
#[derive(Clone)]
pub struct WorkerReplayArtifacts {
pub requests: Vec<TimedReplayRequest>,
pub output_signals: Vec<TimedOutputSignal>,
pub kv_events: Vec<TimedKvEvent>,
}
/// Load the mooncake trace from disk into a flat list of requests. /// Load the mooncake trace from disk into a flat list of requests.
pub fn load_mooncake_trace(path: &str) -> anyhow::Result<Vec<MooncakeRequest>> { pub fn load_mooncake_trace(path: &str) -> anyhow::Result<Vec<MooncakeRequest>> {
let file = File::open(path)?; let file = File::open(path)?;
...@@ -257,11 +293,15 @@ pub fn duplicate_traces(requests: Vec<MooncakeRequest>, factor: usize) -> Vec<Mo ...@@ -257,11 +293,15 @@ pub fn duplicate_traces(requests: Vec<MooncakeRequest>, factor: usize) -> Vec<Mo
/// Expand a request's block-level hash_ids into per-token IDs by repeating each /// Expand a request's block-level hash_ids into per-token IDs by repeating each
/// hash_id `block_size` times. /// hash_id `block_size` times.
pub fn tokens_from_request(request: &MooncakeRequest, block_size: u32) -> Vec<u32> { pub fn tokens_from_request(request: &MooncakeRequest, block_size: u32) -> Vec<u32> {
request let mut tokens = request
.hash_ids .hash_ids
.iter() .iter()
.flat_map(|id| (0..block_size).map(|_| *id as u32)) .flat_map(|id| (0..block_size).map(|_| *id as u32))
.collect() .collect::<Vec<_>>();
if request.input_length > 0 && request.input_length < tokens.len() {
tokens.truncate(request.input_length);
}
tokens
} }
/// Compute the LocalBlockHash for a block-level hash_id the same way the mock /// Compute the LocalBlockHash for a block-level hash_id the same way the mock
...@@ -304,15 +344,19 @@ pub struct BenchmarkResults { ...@@ -304,15 +344,19 @@ pub struct BenchmarkResults {
/// Load, transform, and partition the mooncake trace into per-worker request lists. /// Load, transform, and partition the mooncake trace into per-worker request lists.
pub fn process_mooncake_trace( pub fn process_mooncake_trace(
path: &str, path: &str,
block_size: u32,
trace_length_factor: usize, trace_length_factor: usize,
trace_duplication_factor: usize, trace_duplication_factor: usize,
num_workers: usize, num_workers: usize,
seed: u64, seed: u64,
) -> anyhow::Result<Vec<Vec<MooncakeRequest>>> { ) -> anyhow::Result<Vec<Trace>> {
let requests = load_mooncake_trace(path)?; let trace = Trace::from_mooncake(std::path::Path::new(path), block_size as usize)?
let requests = expand_trace_lengths(requests, trace_length_factor); .expand_hash_prefix_depth(trace_length_factor)
let requests = duplicate_traces(requests, trace_duplication_factor); .duplicate_hash_space(trace_duplication_factor);
Ok(partition_trace(requests, num_workers, seed)) Ok(trace.partition_by_session(SessionPartitionSpec::Random {
num_partitions: num_workers,
seed,
}))
} }
/// Build default MockEngineArgs suitable for event generation. /// Build default MockEngineArgs suitable for event generation.
...@@ -330,98 +374,155 @@ pub fn default_mock_engine_args( ...@@ -330,98 +374,155 @@ pub fn default_mock_engine_args(
.build()?) .build()?)
} }
/// Replay each worker's request trace through a mock engine in real-time to async fn replay_worker_trace(
/// produce the KV cache events (store/remove/clear) that the engine would emit. trace: Trace,
/// sched_args: MockEngineArgs,
/// Returns one event list per worker, each entry paired with the wall-clock
/// instant it was produced.
pub async fn generate_kv_events(
traces: &[Vec<MooncakeRequest>],
num_gpu_blocks: usize,
block_size: u32,
trace_simulation_duration_ms: u64, trace_simulation_duration_ms: u64,
) -> anyhow::Result<Vec<Vec<(KvCacheEvent, Instant)>>> { progress: ProgressBar,
println!("Generating events..."); ) -> anyhow::Result<WorkerReplayArtifacts> {
let sched_args = default_mock_engine_args(num_gpu_blocks, block_size as usize)?; let total_turns = trace
.sessions
let scaled_traces = traces
.iter() .iter()
.map(|worker_trace| scale_mooncake_trace(worker_trace, trace_simulation_duration_ms)); .map(|session| session.turns.len())
.sum::<usize>();
let progress = make_progress_bar(Some( let mut driver = trace
traces.iter().map(|worker| worker.len() as u64).sum::<u64>(), .rescale_ready_span(trace_simulation_duration_ms)?
)); .into_trace_driver()?;
let mut tasks: Vec<JoinHandle<Vec<(KvCacheEvent, Instant)>>> = Vec::new();
for worker_trace in scaled_traces {
let sched_args = sched_args.clone();
let progress = progress.clone();
tasks.push(tokio::spawn(async move {
let collector = EventCollector::new(); let collector = EventCollector::new();
let (output_tx, mut output_rx) = mpsc::unbounded_channel::<OutputSignal>();
let scheduler = Scheduler::new( let scheduler = Scheduler::new(
sched_args, sched_args,
0, 0,
None, Some(output_tx),
KvEventPublishers::new(Some(collector.clone()), None), KvEventPublishers::new(Some(collector.clone()), None),
None, None,
); );
let start = Instant::now();
let mut requests = Vec::with_capacity(total_turns);
let mut output_signals = Vec::new();
let mut completed_turns = 0usize;
while completed_turns < total_turns {
let now_ms = start.elapsed().as_secs_f64() * 1000.0;
for ready_turn in driver.pop_ready(now_ms, usize::MAX) {
let replay_hashes = ready_turn.replay_hashes.ok_or_else(|| {
anyhow::anyhow!("bench replay requires synthesized request hashes")
})?;
requests.push(TimedReplayRequest {
uuid: ready_turn.request_uuid,
timestamp_us: start.elapsed().as_micros() as u64,
scheduled_ready_at_ms: ready_turn.scheduled_ready_at_ms,
input_length: ready_turn.request.tokens.len(),
output_length: ready_turn.request.max_output_tokens,
replay_hashes,
});
scheduler.receive(ready_turn.request);
progress.inc(1);
}
let mut i = 0; if completed_turns >= total_turns {
let mut target = Instant::now(); break;
}
while i < worker_trace.len() { match driver.next_ready_time_ms() {
let prev_i = i; Some(next_ready_ms) => {
scheduler.receive(DirectRequest { let deadline = start + Duration::from_secs_f64((next_ready_ms.max(0.0)) / 1000.0);
tokens: tokens_from_request(&worker_trace[i], block_size), tokio::select! {
max_output_tokens: worker_trace[i].output_length as usize, maybe_signal = output_rx.recv() => {
uuid: Some(worker_trace[i].uuid), let Some(signal) = maybe_signal else {
dp_rank: 0, anyhow::bail!("scheduler ended before workload replay drained");
arrival_timestamp_ms: None, };
output_signals.push(TimedOutputSignal {
signal: signal.clone(),
timestamp_us: start.elapsed().as_micros() as u64,
}); });
i += 1; if signal.completed {
completed_turns += 1;
while i < worker_trace.len() driver.on_complete(signal.uuid, start.elapsed().as_secs_f64() * 1000.0)?;
&& worker_trace[i].timestamp == worker_trace[i - 1].timestamp }
{ }
scheduler.receive(DirectRequest { _ = tokio::time::sleep_until(deadline) => {}
tokens: tokens_from_request(&worker_trace[i], block_size), }
max_output_tokens: worker_trace[i].output_length as usize, }
uuid: Some(worker_trace[i].uuid), None => {
dp_rank: 0, let Some(signal) = output_rx.recv().await else {
arrival_timestamp_ms: None, anyhow::bail!("scheduler ended before workload replay drained");
};
output_signals.push(TimedOutputSignal {
signal: signal.clone(),
timestamp_us: start.elapsed().as_micros() as u64,
}); });
i += 1; if signal.completed {
completed_turns += 1;
driver.on_complete(signal.uuid, start.elapsed().as_secs_f64() * 1000.0)?;
} }
if i < worker_trace.len() {
target += Duration::from_millis(
worker_trace[i].timestamp - worker_trace[i - 1].timestamp,
);
} }
tokio::time::sleep_until(target).await;
progress.inc((i - prev_i) as u64);
} }
}
drop(scheduler);
Ok(WorkerReplayArtifacts {
requests,
output_signals,
kv_events: collector
.get_events()
.into_iter()
.map(|(event, timestamp)| TimedKvEvent {
event,
timestamp_us: timestamp.saturating_duration_since(start).as_micros() as u64,
})
.collect(),
})
}
pub async fn generate_replay_artifacts(
traces: &[Trace],
num_gpu_blocks: usize,
block_size: u32,
trace_simulation_duration_ms: u64,
) -> anyhow::Result<Vec<WorkerReplayArtifacts>> {
println!("Generating events...");
let sched_args = default_mock_engine_args(num_gpu_blocks, block_size as usize)?;
let progress = make_progress_bar(Some(
traces
.iter()
.map(|trace| {
trace
.sessions
.iter()
.map(|session| session.turns.len() as u64)
.sum::<u64>()
})
.sum::<u64>(),
));
collector.get_events() let mut tasks: Vec<JoinHandle<anyhow::Result<WorkerReplayArtifacts>>> = Vec::new();
for trace in traces.iter().cloned() {
let sched_args = sched_args.clone();
let progress = progress.clone();
tasks.push(tokio::spawn(async move {
replay_worker_trace(trace, sched_args, trace_simulation_duration_ms, progress).await
})); }));
} }
let mut events = Vec::new(); let mut artifacts = Vec::new();
for task in tasks { for task in tasks {
events.push(task.await?); artifacts.push(task.await??);
} }
for worker_events in &events { for worker_events in artifacts.iter().map(|artifact| &artifact.kv_events) {
for i in 1..worker_events.len() { for i in 1..worker_events.len() {
assert!(worker_events[i].1 >= worker_events[i - 1].1); assert!(worker_events[i].timestamp_us >= worker_events[i - 1].timestamp_us);
} }
} }
println!( println!(
"Generated {} events. Processing...", "Generated {} events. Processing...",
events.iter().map(|e| e.len()).sum::<usize>() artifacts
.iter()
.map(|artifact| artifact.kv_events.len())
.sum::<usize>()
); );
if progress.elapsed() > Duration::from_millis(trace_simulation_duration_ms * 11 / 10) { if progress.elapsed() > Duration::from_millis(trace_simulation_duration_ms * 11 / 10) {
...@@ -432,8 +533,11 @@ pub async fn generate_kv_events( ...@@ -432,8 +533,11 @@ pub async fn generate_kv_events(
let mut num_stored_events = 0; let mut num_stored_events = 0;
let mut num_removed_events = 0; let mut num_removed_events = 0;
for event in events.iter().flatten() { for event in artifacts
match event.0.data { .iter()
.flat_map(|artifact| artifact.kv_events.iter())
{
match event.event.data {
KvCacheEventData::Stored(_) => num_stored_events += 1, KvCacheEventData::Stored(_) => num_stored_events += 1,
KvCacheEventData::Removed(_) => num_removed_events += 1, KvCacheEventData::Removed(_) => num_removed_events += 1,
_ => (), _ => (),
...@@ -443,7 +547,25 @@ pub async fn generate_kv_events( ...@@ -443,7 +547,25 @@ pub async fn generate_kv_events(
println!("Store events: {}", num_stored_events); println!("Store events: {}", num_stored_events);
println!("Remove events: {}", num_removed_events); println!("Remove events: {}", num_removed_events);
Ok(events) Ok(artifacts)
}
pub async fn generate_kv_events(
traces: &[Trace],
num_gpu_blocks: usize,
block_size: u32,
trace_simulation_duration_ms: u64,
) -> anyhow::Result<Vec<Vec<TimedKvEvent>>> {
Ok(generate_replay_artifacts(
traces,
num_gpu_blocks,
block_size,
trace_simulation_duration_ms,
)
.await?
.into_iter()
.map(|artifact| artifact.kv_events)
.collect())
} }
pub fn plot_sweep( pub fn plot_sweep(
...@@ -591,6 +713,16 @@ pub struct SequenceData { ...@@ -591,6 +713,16 @@ pub struct SequenceData {
pub external_hashes: Vec<ExternalSequenceBlockHash>, pub external_hashes: Vec<ExternalSequenceBlockHash>,
} }
impl From<RouterSequence> for SequenceData {
fn from(sequence: RouterSequence) -> Self {
Self {
worker_id: sequence.worker_id,
local_hashes: sequence.local_hashes,
external_hashes: sequence.external_hashes,
}
}
}
impl SequenceData { impl SequenceData {
/// Create a new sequence with synthetic hashes based on sequence ID. /// Create a new sequence with synthetic hashes based on sequence ID.
pub fn new(seq_id: u64, worker_id: WorkerId, depth: usize) -> Self { pub fn new(seq_id: u64, worker_id: WorkerId, depth: usize) -> Self {
...@@ -673,58 +805,46 @@ pub fn generate_sequences( ...@@ -673,58 +805,46 @@ pub fn generate_sequences(
seed: u64, seed: u64,
use_cumulative_hash: bool, use_cumulative_hash: bool,
) -> Vec<SequenceData> { ) -> Vec<SequenceData> {
let mut sequences = Vec::with_capacity(num_sequences); let trace = Trace::synthetic(SyntheticTraceSpec {
let prefix_length = (depth as f64 * prefix_ratio).round() as usize; block_size: 1,
let mut rng: StdRng = StdRng::seed_from_u64(seed); num_sessions: num_sequences,
turns_per_session: 1,
for seq_id in 0..num_sequences { input_tokens: LengthSpec {
let seq_id_u64 = seq_id as u64; mean: depth,
let worker_id = (seq_id % num_workers) as WorkerId; stddev: 0.0,
},
let group_id = if num_prefix_groups > 0 && prefix_length > 0 { output_tokens: LengthSpec {
Some(rng.random_range(0..num_prefix_groups) as u64) mean: 1,
stddev: 0.0,
},
shared_prefix_ratio: prefix_ratio,
num_prefix_groups,
first_turn_arrivals: ArrivalSpec::Burst,
inter_turn_delays: DelaySpec::None,
seed,
})
.expect("sequence generation spec must be valid");
let hash_mode = if use_cumulative_hash {
SequenceHashMode::Cumulative
} else { } else {
None SequenceHashMode::Raw
}; };
let local_hashes: Vec<LocalBlockHash> = (0..depth) trace
.map(|block_idx| { .partition_by_session(SessionPartitionSpec::RoundRobin {
let block_idx_u64 = block_idx as u64; num_partitions: num_workers,
if let Some(gid) = group_id
&& block_idx < prefix_length
{
return LocalBlockHash(0xDEAD_BEEF_0000_0000 | (gid << 32) | block_idx_u64);
}
LocalBlockHash((seq_id_u64 << 32) | block_idx_u64)
}) })
.collect(); .into_iter()
.enumerate()
if use_cumulative_hash { .flat_map(|(worker_idx, partition)| {
sequences.push(SequenceData::from_local_hashes(worker_id, local_hashes)); partition
} else { .to_router_sequences(worker_idx as WorkerId, hash_mode)
let external_hashes: Vec<ExternalSequenceBlockHash> = (0..depth) .expect("synthetic trace conversion must succeed")
.map(|block_idx| { .into_iter()
let block_idx_u64 = block_idx as u64; .map(SequenceData::from)
if let Some(gid) = group_id .collect::<Vec<_>>()
&& block_idx < prefix_length
{
return ExternalSequenceBlockHash(
0xDEAD_BEEF_0000_0000 | (gid << 32) | block_idx_u64,
);
}
ExternalSequenceBlockHash((seq_id_u64 << 32) | block_idx_u64)
}) })
.collect(); .collect()
sequences.push(SequenceData {
worker_id,
local_hashes,
external_hashes,
});
}
}
sequences
} }
/// Compute median of durations. /// Compute median of durations.
...@@ -736,3 +856,60 @@ pub fn median(durations: &[Duration]) -> Duration { ...@@ -736,3 +856,60 @@ pub fn median(durations: &[Duration]) -> Duration {
sorted.sort(); sorted.sort();
sorted[sorted.len() / 2] sorted[sorted.len() / 2]
} }
#[cfg(test)]
mod tests {
use super::*;
fn multiturn_trace() -> Trace {
Trace {
block_size: 2,
sessions: vec![dynamo_mocker::loadgen::SessionTrace {
session_id: "session-a".to_string(),
first_arrival_timestamp_ms: Some(0.0),
turns: vec![
dynamo_mocker::loadgen::TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
},
dynamo_mocker::loadgen::TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![3, 4],
delay_after_previous_ms: 5.0,
},
],
}],
}
}
#[tokio::test]
async fn test_replay_worker_trace_releases_follow_up_turn_after_completion_delay() {
let artifacts = replay_worker_trace(
multiturn_trace(),
default_mock_engine_args(1024, 2).unwrap(),
5,
make_progress_bar(Some(2)),
)
.await
.unwrap();
assert_eq!(artifacts.requests.len(), 2);
let first_uuid = artifacts.requests[0].uuid;
let first_completion_ms = artifacts
.output_signals
.iter()
.find(|signal| signal.signal.uuid == first_uuid && signal.signal.completed)
.unwrap()
.timestamp_us as f64
/ 1000.0;
assert!(
artifacts.requests[1].scheduled_ready_at_ms + 0.1 >= first_completion_ms + 5.0,
"expected follow-up turn to wait for completion plus delay, got ready_at={} completion_at={}",
artifacts.requests[1].scheduled_ready_at_ms,
first_completion_ms
);
}
}
...@@ -14,6 +14,7 @@ use dynamo_kv_router::protocols::{KvCacheEvent, KvCacheEventData, RouterEvent}; ...@@ -14,6 +14,7 @@ use dynamo_kv_router::protocols::{KvCacheEvent, KvCacheEventData, RouterEvent};
use dynamo_kv_router::{ use dynamo_kv_router::{
ConcurrentRadixTree, ConcurrentRadixTreeCompressed, PositionalIndexer, ThreadPoolIndexer, ConcurrentRadixTree, ConcurrentRadixTreeCompressed, PositionalIndexer, ThreadPoolIndexer,
}; };
use dynamo_mocker::loadgen::Trace;
use serde::Serialize; use serde::Serialize;
use std::sync::Arc; use std::sync::Arc;
use tokio::time::{Duration, Instant}; use tokio::time::{Duration, Instant};
...@@ -194,68 +195,33 @@ struct WorkerTrace { ...@@ -194,68 +195,33 @@ struct WorkerTrace {
/// Timestamps are rescaled from the original trace / simulation durations /// Timestamps are rescaled from the original trace / simulation durations
/// into the benchmark duration (microseconds). /// into the benchmark duration (microseconds).
fn prepare_worker_traces( fn prepare_worker_traces(
traces: Vec<Vec<MooncakeRequest>>, artifacts: Vec<WorkerReplayArtifacts>,
events: Vec<Vec<(KvCacheEvent, Instant)>>,
block_size: u32,
benchmark_duration_ms: u64, benchmark_duration_ms: u64,
trace_simulation_duration_ms: u64,
) -> Vec<Vec<WorkerTrace>> { ) -> Vec<Vec<WorkerTrace>> {
assert!(traces.len() == events.len()); artifacts
let scaled_request_traces: Vec<_> = traces
.into_iter() .into_iter()
.map(|trace| { .map(|artifact| {
let Some(first) = trace.first() else { let mut merged = artifact
return Vec::new(); .requests
};
let first_ts = first.timestamp;
let trace_duration_ms = trace.last().unwrap().timestamp - first_ts;
trace
.into_iter() .into_iter()
.map(|request| WorkerTrace { .map(|request| WorkerTrace {
timestamp_us: if trace_duration_ms == 0 { timestamp_us: request.timestamp_us,
entry: WorkerTraceEntry::Request(request.replay_hashes.local_block_hashes),
})
.chain(artifact.kv_events.into_iter().map(|event| WorkerTrace {
timestamp_us: event.timestamp_us,
entry: WorkerTraceEntry::Event(event.event),
}))
.collect::<Vec<_>>();
merged.sort_by_key(|entry| entry.timestamp_us);
let max_timestamp_us = merged.last().map(|entry| entry.timestamp_us).unwrap_or(0);
for entry in &mut merged {
entry.timestamp_us = if max_timestamp_us == 0 {
0 0
} else { } else {
(request.timestamp - first_ts) * 1000 * benchmark_duration_ms entry.timestamp_us * benchmark_duration_ms * 1000 / max_timestamp_us
/ trace_duration_ms
},
entry: WorkerTraceEntry::Request(
request
.hash_ids
.iter()
.map(|id| local_block_hash_from_id(*id, block_size))
.collect(),
),
})
.collect::<Vec<_>>()
})
.collect();
let scaled_event_traces: Vec<_> = events
.into_iter()
.map(|worker_events| {
let Some(&(_, start_instant)) = worker_events.first() else {
return Vec::new();
}; };
worker_events }
.into_iter()
.map(|(event, timestamp)| WorkerTrace {
timestamp_us: (timestamp - start_instant).as_micros() as u64
* benchmark_duration_ms
/ trace_simulation_duration_ms,
entry: WorkerTraceEntry::Event(event),
})
.collect::<Vec<_>>()
})
.collect();
scaled_request_traces
.into_iter()
.zip(scaled_event_traces)
.map(|(request_trace, event_trace)| {
let mut merged: Vec<WorkerTrace> =
request_trace.into_iter().chain(event_trace).collect();
merged.sort_by_key(|entry| entry.timestamp_us);
merged merged
}) })
.collect() .collect()
...@@ -276,19 +242,12 @@ struct SweepStepResult { ...@@ -276,19 +242,12 @@ struct SweepStepResult {
/// flushed and latency percentiles / throughput stats are printed. /// flushed and latency percentiles / throughput stats are printed.
async fn run_benchmark( async fn run_benchmark(
indexer: Arc<dyn KvIndexerInterface + Send + Sync>, indexer: Arc<dyn KvIndexerInterface + Send + Sync>,
traces: Vec<Vec<MooncakeRequest>>, artifacts: Vec<WorkerReplayArtifacts>,
events: Vec<Vec<(KvCacheEvent, Instant)>>,
args: &Args, args: &Args,
benchmark_duration_ms: u64, benchmark_duration_ms: u64,
count_events: bool, count_events: bool,
) -> anyhow::Result<BenchmarkResults> { ) -> anyhow::Result<BenchmarkResults> {
let worker_traces = prepare_worker_traces( let worker_traces = prepare_worker_traces(artifacts, benchmark_duration_ms);
traces,
events,
args.common.block_size,
benchmark_duration_ms,
args.common.trace_simulation_duration_ms,
);
let worker_traces = worker_traces.into_iter().map(Arc::new).collect::<Vec<_>>(); let worker_traces = worker_traces.into_iter().map(Arc::new).collect::<Vec<_>>();
let progress = make_progress_bar(Some( let progress = make_progress_bar(Some(
...@@ -460,7 +419,7 @@ async fn run_benchmark( ...@@ -460,7 +419,7 @@ async fn run_benchmark(
}) })
} }
fn run_tests() -> anyhow::Result<()> { async fn run_tests() -> anyhow::Result<()> {
use std::collections::HashSet; use std::collections::HashSet;
use std::fs::File; use std::fs::File;
use std::io::Write; use std::io::Write;
...@@ -479,6 +438,7 @@ fn run_tests() -> anyhow::Result<()> { ...@@ -479,6 +438,7 @@ fn run_tests() -> anyhow::Result<()> {
"{}", "{}",
serde_json::json!({ serde_json::json!({
"timestamp": i as u64, "timestamp": i as u64,
"input_length": hash_ids.len(),
"hash_ids": hash_ids, "hash_ids": hash_ids,
"output_length": output_length, "output_length": output_length,
}) })
...@@ -486,12 +446,13 @@ fn run_tests() -> anyhow::Result<()> { ...@@ -486,12 +446,13 @@ fn run_tests() -> anyhow::Result<()> {
} }
} }
let traces = process_mooncake_trace(path.to_str().unwrap(), 2, 2, 2, 42)?; let traces = process_mooncake_trace(path.to_str().unwrap(), 512, 2, 2, 2, 42)?;
std::fs::remove_file(&path).ok(); std::fs::remove_file(&path).ok();
let mut all_hashes: Vec<Vec<u64>> = traces let mut all_hashes: Vec<Vec<u64>> = traces
.into_iter() .into_iter()
.flat_map(|w| w.into_iter().map(|r| r.hash_ids)) .flat_map(|worker| worker.sessions.into_iter())
.flat_map(|session| session.turns.into_iter().map(|turn| turn.hash_ids))
.collect(); .collect();
all_hashes.sort(); all_hashes.sort();
...@@ -519,6 +480,43 @@ fn run_tests() -> anyhow::Result<()> { ...@@ -519,6 +480,43 @@ fn run_tests() -> anyhow::Result<()> {
let set1: HashSet<u64> = copy1.iter().flat_map(|h| h.iter().copied()).collect(); let set1: HashSet<u64> = copy1.iter().flat_map(|h| h.iter().copied()).collect();
assert!(set0.is_disjoint(&set1), "copies are not hash-disjoint"); assert!(set0.is_disjoint(&set1), "copies are not hash-disjoint");
let replay_trace = Trace {
block_size: 2,
sessions: vec![dynamo_mocker::loadgen::SessionTrace {
session_id: "session-a".to_string(),
first_arrival_timestamp_ms: Some(0.0),
turns: vec![
dynamo_mocker::loadgen::TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
},
dynamo_mocker::loadgen::TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![3, 4],
delay_after_previous_ms: 5.0,
},
],
}],
};
let artifacts = generate_replay_artifacts(&[replay_trace], 1024, 2, 5).await?;
assert_eq!(artifacts.len(), 1);
assert_eq!(artifacts[0].requests.len(), 2);
let first_uuid = artifacts[0].requests[0].uuid;
let first_completion_ms = artifacts[0]
.output_signals
.iter()
.find(|signal| signal.signal.uuid == first_uuid && signal.signal.completed)
.expect("first request must complete")
.timestamp_us as f64
/ 1000.0;
assert!(
artifacts[0].requests[1].scheduled_ready_at_ms + 0.1 >= first_completion_ms + 5.0,
"expected second request to wait for completion plus delay"
);
println!("All tests passed."); println!("All tests passed.");
Ok(()) Ok(())
} }
...@@ -528,7 +526,7 @@ async fn main() -> anyhow::Result<()> { ...@@ -528,7 +526,7 @@ async fn main() -> anyhow::Result<()> {
let args = Args::parse(); let args = Args::parse();
if args.common.test { if args.common.test {
return run_tests(); return run_tests().await;
} }
let path = match args.common.mooncake_trace_path.as_deref() { let path = match args.common.mooncake_trace_path.as_deref() {
...@@ -540,12 +538,13 @@ async fn main() -> anyhow::Result<()> { ...@@ -540,12 +538,13 @@ async fn main() -> anyhow::Result<()> {
}; };
let traces = process_mooncake_trace( let traces = process_mooncake_trace(
path, path,
args.common.block_size,
args.common.trace_length_factor, args.common.trace_length_factor,
args.common.trace_duplication_factor, args.common.trace_duplication_factor,
args.common.num_unique_inference_workers, args.common.num_unique_inference_workers,
args.common.seed, args.common.seed,
)?; )?;
let events = generate_kv_events( let artifacts = generate_replay_artifacts(
&traces, &traces,
args.common.num_gpu_blocks, args.common.num_gpu_blocks,
args.common.block_size, args.common.block_size,
...@@ -599,15 +598,8 @@ async fn main() -> anyhow::Result<()> { ...@@ -599,15 +598,8 @@ async fn main() -> anyhow::Result<()> {
IndexerArgs::from_name(name, args.common.block_size, args.num_event_workers)? IndexerArgs::from_name(name, args.common.block_size, args.num_event_workers)?
}; };
let count_events = IndexerArgs::supports_remove(name); let count_events = IndexerArgs::supports_remove(name);
let result = run_benchmark( let result =
indexer, run_benchmark(indexer, artifacts.clone(), &args, dur_ms, count_events).await?;
traces.clone(),
events.clone(),
&args,
dur_ms,
count_events,
)
.await?;
if multi_threaded { if multi_threaded {
if result.block_throughput >= result.offered_block_throughput * 0.95 { if result.block_throughput >= result.offered_block_throughput * 0.95 {
...@@ -674,8 +666,7 @@ async fn main() -> anyhow::Result<()> { ...@@ -674,8 +666,7 @@ async fn main() -> anyhow::Result<()> {
let count_events = IndexerArgs::supports_remove(name); let count_events = IndexerArgs::supports_remove(name);
run_benchmark( run_benchmark(
indexer, indexer,
traces.clone(), artifacts.clone(),
events.clone(),
&args, &args,
args.common.benchmark_duration_ms, args.common.benchmark_duration_ms,
count_events, count_events,
......
...@@ -13,6 +13,9 @@ ...@@ -13,6 +13,9 @@
use anyhow::{Context, Result}; use anyhow::{Context, Result};
use clap::Parser; use clap::Parser;
use dynamo_bench::common::{ChatMessage, LatencyStats, fetch_model_name}; use dynamo_bench::common::{ChatMessage, LatencyStats, fetch_model_name};
use dynamo_mocker::loadgen::{
ArrivalSpec, DelaySpec, LengthSpec, SessionTrace, SyntheticTraceSpec, Trace,
};
use futures_util::StreamExt; use futures_util::StreamExt;
use indicatif::{ProgressBar, ProgressStyle}; use indicatif::{ProgressBar, ProgressStyle};
use rand::rngs::StdRng; use rand::rngs::StdRng;
...@@ -283,10 +286,10 @@ async fn run_user( ...@@ -283,10 +286,10 @@ async fn run_user(
model: String, model: String,
args: Arc<Args>, args: Arc<Args>,
user_id: usize, user_id: usize,
session: SessionTrace,
progress: ProgressBar, progress: ProgressBar,
) -> Vec<TurnResult> { ) -> Vec<TurnResult> {
let mut rng = StdRng::seed_from_u64(args.seed.wrapping_add(user_id as u64)); let mut rng = StdRng::seed_from_u64(args.seed.wrapping_add(user_id as u64));
let mean_delay = args.mean_delay_ms as f64;
let system_prompt = generate_system_prompt(user_id); let system_prompt = generate_system_prompt(user_id);
let mut messages = vec![ChatMessage { let mut messages = vec![ChatMessage {
...@@ -294,11 +297,10 @@ async fn run_user( ...@@ -294,11 +297,10 @@ async fn run_user(
content: system_prompt, content: system_prompt,
}]; }];
let mut results = Vec::with_capacity(args.num_turns); let mut results = Vec::with_capacity(session.turns.len());
for turn in 0..args.num_turns { for (turn, turn_spec) in session.turns.iter().enumerate() {
// Generate user prompt let user_text = generate_lorem(&mut rng, turn_spec.input_length);
let user_text = generate_lorem(&mut rng, args.num_user_tokens);
messages.push(ChatMessage { messages.push(ChatMessage {
role: "user".to_string(), role: "user".to_string(),
content: user_text, content: user_text,
...@@ -307,7 +309,7 @@ async fn run_user( ...@@ -307,7 +309,7 @@ async fn run_user(
let body = MultiturnRequest { let body = MultiturnRequest {
model: model.clone(), model: model.clone(),
messages: messages.clone(), messages: messages.clone(),
max_completion_tokens: args.max_completion_tokens, max_completion_tokens: turn_spec.max_output_tokens as u32,
ignore_eos: if args.ignore_eos { Some(true) } else { None }, ignore_eos: if args.ignore_eos { Some(true) } else { None },
stream: true, stream: true,
nvext: if args.speculative_prefill { nvext: if args.speculative_prefill {
...@@ -392,7 +394,7 @@ async fn run_user( ...@@ -392,7 +394,7 @@ async fn run_user(
" [user {}][turn {}/{}] ttft={:.1}ms total={:.1}s ok={}", " [user {}][turn {}/{}] ttft={:.1}ms total={:.1}s ok={}",
user_id, user_id,
turn + 1, turn + 1,
args.num_turns, session.turns.len(),
result.ttft_us as f64 / 1000.0, result.ttft_us as f64 / 1000.0,
result.total_latency_us as f64 / 1_000_000.0, result.total_latency_us as f64 / 1_000_000.0,
result.success, result.success,
...@@ -404,10 +406,13 @@ async fn run_user( ...@@ -404,10 +406,13 @@ async fn run_user(
// Exponential inter-turn delay (skip after last turn) // Exponential inter-turn delay (skip after last turn)
// Exp(1/mean) = -mean * ln(U), U ~ Uniform(0,1) // Exp(1/mean) = -mean * ln(U), U ~ Uniform(0,1)
if turn + 1 < args.num_turns { if let Some(next_turn) = session.turns.get(turn + 1)
let u: f64 = rng.random(); && next_turn.delay_after_previous_ms > 0.0
let delay_ms = (-mean_delay * u.ln()).max(0.0); {
tokio::time::sleep(Duration::from_millis(delay_ms as u64)).await; tokio::time::sleep(Duration::from_secs_f64(
next_turn.delay_after_previous_ms / 1000.0,
))
.await;
} }
} }
...@@ -569,6 +574,32 @@ async fn main() -> Result<()> { ...@@ -569,6 +574,32 @@ async fn main() -> Result<()> {
.build() .build()
.context("Failed to create HTTP client")?; .context("Failed to create HTTP client")?;
let workload = Trace::synthetic(SyntheticTraceSpec {
block_size: 1,
num_sessions: args.num_users,
turns_per_session: args.num_turns,
input_tokens: LengthSpec {
mean: args.num_user_tokens,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: args.max_completion_tokens as usize,
stddev: 0.0,
},
shared_prefix_ratio: 0.0,
num_prefix_groups: 0,
first_turn_arrivals: ArrivalSpec::Burst,
inter_turn_delays: if args.mean_delay_ms == 0 {
DelaySpec::None
} else {
DelaySpec::ExponentialMs {
mean_ms: args.mean_delay_ms as f64,
}
},
seed: args.seed,
})?;
let sessions = workload.sessions;
let args = Arc::new(args); let args = Arc::new(args);
let chat_url = format!("{}/v1/chat/completions", args.url); let chat_url = format!("{}/v1/chat/completions", args.url);
...@@ -592,14 +623,18 @@ async fn main() -> Result<()> { ...@@ -592,14 +623,18 @@ async fn main() -> Result<()> {
.progress_chars("#>-"), .progress_chars("#>-"),
); );
let handles: Vec<_> = (0..args.num_users) let handles: Vec<_> = sessions
.map(|user_id| { .into_iter()
.enumerate()
.map(|(user_id, session)| {
let client = client.clone(); let client = client.clone();
let url = chat_url.clone(); let url = chat_url.clone();
let model = model.clone(); let model = model.clone();
let args = args.clone(); let args = args.clone();
let progress = progress.clone(); let progress = progress.clone();
tokio::spawn(async move { run_user(client, url, model, args, user_id, progress).await }) tokio::spawn(async move {
run_user(client, url, model, args, user_id, session, progress).await
})
}) })
.collect(); .collect();
......
...@@ -10,6 +10,9 @@ use dynamo_mocker::common::protocols::{ ...@@ -10,6 +10,9 @@ use dynamo_mocker::common::protocols::{
PreemptionMode as RsPreemptionMode, ReasoningConfig as RsReasoningConfig, PreemptionMode as RsPreemptionMode, ReasoningConfig as RsReasoningConfig,
SglangArgs as RsSglangArgs, WorkerType as RsWorkerType, SglangArgs as RsSglangArgs, WorkerType as RsWorkerType,
}; };
use dynamo_mocker::loadgen::{
ArrivalSpec, DelaySpec, LengthSpec, SyntheticTraceSpec, Trace as RsTrace,
};
use pyo3::{exceptions::PyException, prelude::*}; use pyo3::{exceptions::PyException, prelude::*};
use pythonize::pythonize; use pythonize::pythonize;
use uuid::Uuid; use uuid::Uuid;
...@@ -356,7 +359,7 @@ pub fn run_mocker_trace_replay( ...@@ -356,7 +359,7 @@ pub fn run_mocker_trace_replay(
} }
#[pyfunction] #[pyfunction]
#[pyo3(signature = (input_tokens, output_tokens, request_count, extra_engine_args=None, router_config=None, num_workers=1, replay_concurrency=None, replay_mode="offline", router_mode="round_robin", arrival_speedup_ratio=1.0, arrival_interval_ms=1.0))] #[pyo3(signature = (input_tokens, output_tokens, request_count, extra_engine_args=None, router_config=None, num_workers=1, replay_concurrency=None, replay_mode="offline", router_mode="round_robin", arrival_speedup_ratio=1.0, arrival_interval_ms=1.0, turns_per_session=1, shared_prefix_ratio=0.0, num_prefix_groups=0, inter_turn_delay_ms=0.0))]
#[allow(clippy::too_many_arguments)] #[allow(clippy::too_many_arguments)]
pub fn run_mocker_synthetic_trace_replay( pub fn run_mocker_synthetic_trace_replay(
py: Python<'_>, py: Python<'_>,
...@@ -371,6 +374,10 @@ pub fn run_mocker_synthetic_trace_replay( ...@@ -371,6 +374,10 @@ pub fn run_mocker_synthetic_trace_replay(
router_mode: &str, router_mode: &str,
arrival_speedup_ratio: f64, arrival_speedup_ratio: f64,
arrival_interval_ms: f64, arrival_interval_ms: f64,
turns_per_session: usize,
shared_prefix_ratio: f64,
num_prefix_groups: usize,
inter_turn_delay_ms: f64,
) -> PyResult<PyObject> { ) -> PyResult<PyObject> {
let args = load_replay_mocker_args(py, extra_engine_args)?; let args = load_replay_mocker_args(py, extra_engine_args)?;
let router_config = load_replay_router_config(router_config); let router_config = load_replay_router_config(router_config);
...@@ -378,6 +385,73 @@ pub fn run_mocker_synthetic_trace_replay( ...@@ -378,6 +385,73 @@ pub fn run_mocker_synthetic_trace_replay(
let router_mode = parse_replay_router_mode(router_mode)?; let router_mode = parse_replay_router_mode(router_mode)?;
let report = py.allow_threads(move || { let report = py.allow_threads(move || {
let replay_concurrency = parse_replay_concurrency(replay_concurrency)?; let replay_concurrency = parse_replay_concurrency(replay_concurrency)?;
let use_workload = turns_per_session > 1
|| shared_prefix_ratio > 0.0
|| num_prefix_groups > 0
|| inter_turn_delay_ms > 0.0;
if use_workload {
let mut trace = build_synthetic_workload(
args.block_size.max(1),
input_tokens,
output_tokens,
request_count,
arrival_interval_ms,
turns_per_session,
shared_prefix_ratio,
num_prefix_groups,
inter_turn_delay_ms,
)?;
if replay_concurrency.is_none() {
trace = trace.speed_up_timing(arrival_speedup_ratio)?;
}
return match (replay_mode.as_str(), replay_concurrency) {
("offline", Some(max_in_flight)) => {
dynamo_mocker::replay::simulate_concurrency_workload_with_router_mode(
args,
router_config.clone(),
trace,
max_in_flight,
num_workers,
router_mode,
)
}
("offline", None) => {
dynamo_mocker::replay::simulate_trace_workload_with_router_mode(
args,
router_config.clone(),
trace,
num_workers,
router_mode,
)
}
("online", Some(max_in_flight)) => {
dynamo_mocker::replay::simulate_concurrency_live_workload_with_router_mode(
args,
router_config.clone(),
trace,
max_in_flight,
num_workers,
router_mode,
)
}
("online", None) => {
dynamo_mocker::replay::simulate_trace_live_workload_with_router_mode(
args,
router_config.clone(),
trace,
num_workers,
router_mode,
)
}
(other, _) => anyhow::bail!(
"replay_mode must be either 'offline' or 'online', got '{}'",
other
),
};
}
let requests = build_synthetic_requests( let requests = build_synthetic_requests(
input_tokens, input_tokens,
output_tokens, output_tokens,
...@@ -509,6 +583,69 @@ fn parse_replay_concurrency(replay_concurrency: Option<isize>) -> anyhow::Result ...@@ -509,6 +583,69 @@ fn parse_replay_concurrency(replay_concurrency: Option<isize>) -> anyhow::Result
} }
} }
#[allow(clippy::too_many_arguments)]
fn build_synthetic_workload(
block_size: usize,
input_tokens: usize,
output_tokens: usize,
request_count: usize,
arrival_interval_ms: f64,
turns_per_session: usize,
shared_prefix_ratio: f64,
num_prefix_groups: usize,
inter_turn_delay_ms: f64,
) -> anyhow::Result<RsTrace> {
if input_tokens == 0 {
anyhow::bail!("input_tokens must be at least 1");
}
if output_tokens == 0 {
anyhow::bail!("output_tokens must be at least 1");
}
if request_count == 0 {
anyhow::bail!("request_count must be at least 1");
}
if turns_per_session == 0 {
anyhow::bail!("turns_per_session must be at least 1");
}
if !arrival_interval_ms.is_finite() || arrival_interval_ms < 0.0 {
anyhow::bail!("arrival_interval_ms must be a finite non-negative number");
}
if !inter_turn_delay_ms.is_finite() || inter_turn_delay_ms < 0.0 {
anyhow::bail!("inter_turn_delay_ms must be a finite non-negative number");
}
let first_turn_arrivals = if arrival_interval_ms == 0.0 {
ArrivalSpec::Burst
} else {
ArrivalSpec::ConstantQps {
qps: 1000.0 / arrival_interval_ms,
}
};
RsTrace::synthetic(SyntheticTraceSpec {
block_size,
num_sessions: request_count,
turns_per_session,
input_tokens: LengthSpec {
mean: input_tokens,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: output_tokens,
stddev: 0.0,
},
shared_prefix_ratio,
num_prefix_groups,
first_turn_arrivals,
inter_turn_delays: if inter_turn_delay_ms == 0.0 {
DelaySpec::None
} else {
DelaySpec::ConstantMs(inter_turn_delay_ms)
},
seed: 42,
})
}
fn build_synthetic_requests( fn build_synthetic_requests(
input_tokens: usize, input_tokens: usize,
output_tokens: usize, output_tokens: usize,
......
...@@ -1388,6 +1388,10 @@ def run_mocker_synthetic_trace_replay( ...@@ -1388,6 +1388,10 @@ def run_mocker_synthetic_trace_replay(
router_mode: Literal["round_robin", "kv_router"] = "round_robin", router_mode: Literal["round_robin", "kv_router"] = "round_robin",
arrival_speedup_ratio: float = 1.0, arrival_speedup_ratio: float = 1.0,
arrival_interval_ms: float = 1.0, arrival_interval_ms: float = 1.0,
turns_per_session: int = 1,
shared_prefix_ratio: float = 0.0,
num_prefix_groups: int = 0,
inter_turn_delay_ms: float = 0.0,
) -> Dict[str, Any]: ) -> Dict[str, Any]:
"""Replay a synthetic mocker workload without requiring a trace file.""" """Replay a synthetic mocker workload without requiring a trace file."""
... ...
......
...@@ -43,6 +43,10 @@ def run_synthetic_trace_replay( ...@@ -43,6 +43,10 @@ def run_synthetic_trace_replay(
router_mode="round_robin", router_mode="round_robin",
arrival_speedup_ratio=1.0, arrival_speedup_ratio=1.0,
arrival_interval_ms=1.0, arrival_interval_ms=1.0,
turns_per_session=1,
shared_prefix_ratio=0.0,
num_prefix_groups=0,
inter_turn_delay_ms=0.0,
): ):
return _run_mocker_synthetic_trace_replay( return _run_mocker_synthetic_trace_replay(
input_tokens, input_tokens,
...@@ -56,4 +60,8 @@ def run_synthetic_trace_replay( ...@@ -56,4 +60,8 @@ def run_synthetic_trace_replay(
router_mode=router_mode, router_mode=router_mode,
arrival_speedup_ratio=arrival_speedup_ratio, arrival_speedup_ratio=arrival_speedup_ratio,
arrival_interval_ms=arrival_interval_ms, arrival_interval_ms=arrival_interval_ms,
turns_per_session=turns_per_session,
shared_prefix_ratio=shared_prefix_ratio,
num_prefix_groups=num_prefix_groups,
inter_turn_delay_ms=inter_turn_delay_ms,
) )
...@@ -22,8 +22,16 @@ def main(argv: Sequence[str] | None = None) -> int: ...@@ -22,8 +22,16 @@ def main(argv: Sequence[str] | None = None) -> int:
parser.add_argument("--router-config") parser.add_argument("--router-config")
parser.add_argument("--input-tokens", type=int) parser.add_argument("--input-tokens", type=int)
parser.add_argument("--output-tokens", type=int) parser.add_argument("--output-tokens", type=int)
parser.add_argument("--request-count", type=int) parser.add_argument(
"--request-count",
type=int,
help="number of synthetic requests; when --turns-per-session > 1, this is the number of sessions",
)
parser.add_argument("--arrival-interval-ms", type=float, default=1.0) parser.add_argument("--arrival-interval-ms", type=float, default=1.0)
parser.add_argument("--turns-per-session", type=int, default=1)
parser.add_argument("--shared-prefix-ratio", type=float, default=0.0)
parser.add_argument("--num-prefix-groups", type=int, default=0)
parser.add_argument("--inter-turn-delay-ms", type=float, default=0.0)
parser.add_argument("--num-workers", type=int, default=1) parser.add_argument("--num-workers", type=int, default=1)
parser.add_argument("--replay-concurrency", type=int) parser.add_argument("--replay-concurrency", type=int)
parser.add_argument( parser.add_argument(
...@@ -45,7 +53,14 @@ def main(argv: Sequence[str] | None = None) -> int: ...@@ -45,7 +53,14 @@ def main(argv: Sequence[str] | None = None) -> int:
using_trace_file = args.trace_file is not None using_trace_file = args.trace_file is not None
synthetic_args = (args.input_tokens, args.output_tokens, args.request_count) synthetic_args = (args.input_tokens, args.output_tokens, args.request_count)
using_synthetic = any(value is not None for value in synthetic_args) using_synthetic = any(value is not None for value in synthetic_args) or any(
(
args.turns_per_session != 1,
args.shared_prefix_ratio != 0.0,
args.num_prefix_groups != 0,
args.inter_turn_delay_ms != 0.0,
)
)
if using_trace_file == using_synthetic: if using_trace_file == using_synthetic:
parser.error( parser.error(
...@@ -91,6 +106,10 @@ def main(argv: Sequence[str] | None = None) -> int: ...@@ -91,6 +106,10 @@ def main(argv: Sequence[str] | None = None) -> int:
router_mode=args.router_mode, router_mode=args.router_mode,
arrival_speedup_ratio=args.arrival_speedup_ratio, arrival_speedup_ratio=args.arrival_speedup_ratio,
arrival_interval_ms=args.arrival_interval_ms, arrival_interval_ms=args.arrival_interval_ms,
turns_per_session=args.turns_per_session,
shared_prefix_ratio=args.shared_prefix_ratio,
num_prefix_groups=args.num_prefix_groups,
inter_turn_delay_ms=args.inter_turn_delay_ms,
) )
report_path = write_report_json(report, args.report_json) report_path = write_report_json(report, args.report_json)
......
...@@ -110,6 +110,45 @@ def _write_trace_and_args(tmp_path): ...@@ -110,6 +110,45 @@ def _write_trace_and_args(tmp_path):
return trace_path return trace_path
def _write_multiturn_trace(tmp_path):
trace_path = tmp_path / "multiturn_trace.jsonl"
records = [
{
"session_id": "session-a",
"timestamp": 1000.0,
"input_length": 64,
"output_length": 2,
"hash_ids": [101],
},
{
"session_id": "session-b",
"timestamp": 1002.0,
"input_length": 64,
"output_length": 2,
"hash_ids": [202],
},
{
"session_id": "session-a",
"delay": 5.0,
"input_length": 64,
"output_length": 2,
"hash_ids": [303],
},
{
"session_id": "session-b",
"delay": 1.0,
"input_length": 64,
"output_length": 2,
"hash_ids": [404],
},
]
trace_path.write_text(
"\n".join(json.dumps(record) for record in records) + "\n",
encoding="utf-8",
)
return trace_path
def _write_cli_smoke_trace(tmp_path): def _write_cli_smoke_trace(tmp_path):
trace_path = tmp_path / "cli_smoke_trace.jsonl" trace_path = tmp_path / "cli_smoke_trace.jsonl"
records = [] records = []
...@@ -283,6 +322,26 @@ def test_run_trace_replay_invariant_counts_match(tmp_path, engine_type, replay_m ...@@ -283,6 +322,26 @@ def test_run_trace_replay_invariant_counts_match(tmp_path, engine_type, replay_m
assert single[field] == multi_kv_router[field] assert single[field] == multi_kv_router[field]
@pytest.mark.parametrize("replay_mode", ["offline", "online"])
def test_run_trace_replay_supports_multiturn_sessions(tmp_path, replay_mode):
trace_path = _write_multiturn_trace(tmp_path)
report = run_trace_replay(
trace_path,
extra_engine_args=_vllm_args(),
num_workers=2,
replay_mode=replay_mode,
router_mode="kv_router",
)
_assert_basic_report_counts(
report,
num_requests=4,
input_tokens=64,
output_tokens=2,
)
@pytest.mark.parametrize("engine_type", ["vllm", "sglang"]) @pytest.mark.parametrize("engine_type", ["vllm", "sglang"])
@pytest.mark.parametrize("replay_mode", ["offline", "online"]) @pytest.mark.parametrize("replay_mode", ["offline", "online"])
@pytest.mark.parametrize("router_mode", ["round_robin", "kv_router"]) @pytest.mark.parametrize("router_mode", ["round_robin", "kv_router"])
...@@ -358,6 +417,53 @@ def test_run_synthetic_trace_replay_invariant_counts_match( ...@@ -358,6 +417,53 @@ def test_run_synthetic_trace_replay_invariant_counts_match(
assert single[field] == multi_kv_router[field] assert single[field] == multi_kv_router[field]
@pytest.mark.parametrize("replay_mode", ["offline", "online"])
def test_run_synthetic_trace_replay_supports_multiturn_workloads(tmp_path, replay_mode):
report = run_synthetic_trace_replay(
64,
2,
3,
extra_engine_args=_vllm_args(),
num_workers=2,
replay_mode=replay_mode,
router_mode="kv_router",
turns_per_session=2,
inter_turn_delay_ms=5.0,
shared_prefix_ratio=0.5,
num_prefix_groups=2,
)
_assert_basic_report_counts(
report,
num_requests=6,
input_tokens=64,
output_tokens=2,
)
@pytest.mark.parametrize(
("input_tokens", "output_tokens", "expected_message"),
[
(0, 2, "input_tokens must be at least 1"),
(2, 0, "output_tokens must be at least 1"),
],
)
def test_run_synthetic_trace_replay_workload_validates_zero_token_lengths(
input_tokens, output_tokens, expected_message
):
with pytest.raises(Exception, match=expected_message):
run_synthetic_trace_replay(
input_tokens,
output_tokens,
2,
extra_engine_args=_vllm_args(),
num_workers=2,
replay_mode="offline",
router_mode="kv_router",
turns_per_session=2,
)
@pytest.mark.parametrize("engine_type", ["vllm", "sglang"]) @pytest.mark.parametrize("engine_type", ["vllm", "sglang"])
@pytest.mark.parametrize("replay_mode", ["offline", "online"]) @pytest.mark.parametrize("replay_mode", ["offline", "online"])
def test_run_synthetic_concurrency_replay_counts_match( def test_run_synthetic_concurrency_replay_counts_match(
...@@ -551,6 +657,48 @@ def test_replay_cli_prints_table_and_saves_json(tmp_path, monkeypatch, capsys): ...@@ -551,6 +657,48 @@ def test_replay_cli_prints_table_and_saves_json(tmp_path, monkeypatch, capsys):
assert json.loads(report_path.read_text(encoding="utf-8")) == report assert json.loads(report_path.read_text(encoding="utf-8")) == report
def test_replay_cli_passes_multiturn_workload_kwargs(monkeypatch):
captured = {}
def fake_run(*args, **kwargs):
captured["args"] = args
captured["kwargs"] = kwargs
return {
"completed_requests": 4,
"request_throughput_rps": 1.0,
"output_throughput_tok_s": 1.0,
}
monkeypatch.setattr("dynamo.replay.main.run_synthetic_trace_replay", fake_run)
exit_code = main(
[
"--input-tokens",
"16",
"--output-tokens",
"8",
"--request-count",
"2",
"--turns-per-session",
"2",
"--shared-prefix-ratio",
"0.5",
"--num-prefix-groups",
"3",
"--inter-turn-delay-ms",
"7.0",
]
)
assert exit_code == 0
assert captured["args"] == (16, 8, 2)
assert captured["kwargs"]["turns_per_session"] == 2
assert captured["kwargs"]["shared_prefix_ratio"] == 0.5
assert captured["kwargs"]["num_prefix_groups"] == 3
assert captured["kwargs"]["inter_turn_delay_ms"] == 7.0
@pytest.mark.timeout(30)
def test_replay_cli_subprocess_synthetic_smoke(tmp_path): def test_replay_cli_subprocess_synthetic_smoke(tmp_path):
report_path = tmp_path / "synthetic_report.json" report_path = tmp_path / "synthetic_report.json"
...@@ -582,6 +730,45 @@ def test_replay_cli_subprocess_synthetic_smoke(tmp_path): ...@@ -582,6 +730,45 @@ def test_replay_cli_subprocess_synthetic_smoke(tmp_path):
_assert_basic_report_metrics(report) _assert_basic_report_metrics(report)
@pytest.mark.timeout(30)
def test_replay_cli_subprocess_synthetic_multiturn_smoke(tmp_path):
report_path = tmp_path / "synthetic_multiturn_report.json"
completed = _run_replay_cli(
tmp_path,
"--input-tokens",
"64",
"--output-tokens",
"4",
"--request-count",
"3",
"--turns-per-session",
"2",
"--shared-prefix-ratio",
"0.5",
"--num-prefix-groups",
"2",
"--inter-turn-delay-ms",
"5.0",
"--num-workers",
"2",
"--report-json",
str(report_path),
"--extra-engine-args",
'{"block_size":64,"speedup_ratio":1000.0}',
)
report = _assert_replay_cli_outputs(completed, report_path)
_assert_basic_report_counts(
report,
num_requests=6,
input_tokens=64,
output_tokens=4,
)
_assert_basic_report_metrics(report)
@pytest.mark.timeout(30)
def test_replay_cli_subprocess_trace_smoke(tmp_path): def test_replay_cli_subprocess_trace_smoke(tmp_path):
trace_path = _write_cli_smoke_trace(tmp_path) trace_path = _write_cli_smoke_trace(tmp_path)
report_path = tmp_path / "trace_report.json" report_path = tmp_path / "trace_report.json"
...@@ -609,3 +796,33 @@ def test_replay_cli_subprocess_trace_smoke(tmp_path): ...@@ -609,3 +796,33 @@ def test_replay_cli_subprocess_trace_smoke(tmp_path):
output_tokens=25, output_tokens=25,
) )
_assert_basic_report_metrics(report) _assert_basic_report_metrics(report)
@pytest.mark.timeout(30)
def test_replay_cli_subprocess_multiturn_trace_smoke(tmp_path):
trace_path = _write_multiturn_trace(tmp_path)
report_path = tmp_path / "multiturn_trace_report.json"
completed = _run_replay_cli(
tmp_path,
str(trace_path),
"--replay-mode",
"online",
"--router-mode",
"kv_router",
"--num-workers",
"2",
"--report-json",
str(report_path),
"--extra-engine-args",
'{"block_size":64,"speedup_ratio":1000.0}',
)
report = _assert_replay_cli_outputs(completed, report_path)
_assert_basic_report_counts(
report,
num_requests=4,
input_tokens=64,
output_tokens=2,
)
_assert_basic_report_metrics(report)
...@@ -86,6 +86,9 @@ pub enum SequenceError { ...@@ -86,6 +86,9 @@ pub enum SequenceError {
#[error("Failed to publish event: {0}")] #[error("Failed to publish event: {0}")]
PublishFailed(#[from] anyhow::Error), PublishFailed(#[from] anyhow::Error),
#[error("Synchronous mutation requires replica_sync=false")]
SyncMutationRequiresNoReplicaSync,
} }
/// Bundled parameters for adding a request to the sequence tracker. /// Bundled parameters for adding a request to the sequence tracker.
...@@ -364,7 +367,14 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> { ...@@ -364,7 +367,14 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
} }
} }
pub async fn add_request(&self, req: SequenceRequest) -> Result<(), SequenceError> { fn ensure_sync_mutation_allowed(&self) -> Result<(), SequenceError> {
if self.replica_sync {
return Err(SequenceError::SyncMutationRequiresNoReplicaSync);
}
Ok(())
}
fn add_request_local(&self, req: SequenceRequest) -> Result<(), SequenceError> {
let SequenceRequest { let SequenceRequest {
request_id, request_id,
token_sequence, token_sequence,
...@@ -386,22 +396,6 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> { ...@@ -386,22 +396,6 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
}); });
} }
if self.replica_sync {
let event = ActiveSequenceEvent {
request_id: request_id.clone(),
worker,
data: ActiveSequenceEventData::AddRequest {
token_sequence: token_sequence.clone(),
isl,
overlap,
expected_output_tokens,
},
router_id: self.router_id,
lora_name: lora_name.clone(),
};
self.publisher.publish_event(&event).await?;
}
self.request_to_worker.insert(request_id.clone(), worker); self.request_to_worker.insert(request_id.clone(), worker);
if let Some(lora) = lora_name { if let Some(lora) = lora_name {
...@@ -434,12 +428,36 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> { ...@@ -434,12 +428,36 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
Ok(()) Ok(())
} }
pub async fn add_request(&self, req: SequenceRequest) -> Result<(), SequenceError> {
if self.replica_sync {
let event = ActiveSequenceEvent {
request_id: req.request_id.clone(),
worker: req.worker,
data: ActiveSequenceEventData::AddRequest {
token_sequence: req.token_sequence.clone(),
isl: req.isl,
overlap: req.overlap,
expected_output_tokens: req.expected_output_tokens,
},
router_id: self.router_id,
lora_name: req.lora_name.clone(),
};
self.publisher.publish_event(&event).await?;
}
self.add_request_local(req)
}
pub fn add_request_sync(&self, req: SequenceRequest) -> Result<(), SequenceError> {
self.ensure_sync_mutation_allowed()?;
self.add_request_local(req)
}
/// Send a mutation to the worker assigned to a request, optionally publishing /// Send a mutation to the worker assigned to a request, optionally publishing
/// a replica-sync event and cleaning up request mappings afterward. /// a replica-sync event and cleaning up request mappings afterward.
async fn mutate_request_worker( fn mutate_request_worker_local(
&self, &self,
request_id: &RequestId, request_id: &RequestId,
event_data: ActiveSequenceEventData,
mutate_fn: impl FnOnce(&mut ActiveSequences, &RequestId), mutate_fn: impl FnOnce(&mut ActiveSequences, &RequestId),
remove_mapping: bool, remove_mapping: bool,
) -> Result<(), SequenceError> { ) -> Result<(), SequenceError> {
...@@ -451,22 +469,6 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> { ...@@ -451,22 +469,6 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
request_id: request_id.clone(), request_id: request_id.clone(),
})?; })?;
if self.replica_sync {
let lora_name = self
.request_to_lora
.get(request_id)
.map(|entry| entry.value().clone());
let event = ActiveSequenceEvent {
request_id: request_id.clone(),
worker,
data: event_data,
router_id: self.router_id,
lora_name,
};
self.publisher.publish_event(&event).await?;
}
{ {
let table = self.workers.read(); let table = self.workers.read();
let &idx = table let &idx = table
...@@ -487,6 +489,40 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> { ...@@ -487,6 +489,40 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
Ok(()) Ok(())
} }
async fn mutate_request_worker(
&self,
request_id: &RequestId,
event_data: ActiveSequenceEventData,
mutate_fn: impl FnOnce(&mut ActiveSequences, &RequestId),
remove_mapping: bool,
) -> Result<(), SequenceError> {
let worker = self
.request_to_worker
.get(request_id)
.map(|entry| *entry)
.ok_or_else(|| SequenceError::RequestNotFound {
request_id: request_id.clone(),
})?;
if self.replica_sync {
let lora_name = self
.request_to_lora
.get(request_id)
.map(|entry| entry.value().clone());
let event = ActiveSequenceEvent {
request_id: request_id.clone(),
worker,
data: event_data,
router_id: self.router_id,
lora_name,
};
self.publisher.publish_event(&event).await?;
}
self.mutate_request_worker_local(request_id, mutate_fn, remove_mapping)
}
/// Free all blocks associated with a request. /// Free all blocks associated with a request.
/// ///
/// Note: This operation is idempotent. Calling it multiple times for the same request /// Note: This operation is idempotent. Calling it multiple times for the same request
...@@ -508,6 +544,21 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> { ...@@ -508,6 +544,21 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
.await .await
} }
pub fn free_sync(&self, request_id: &RequestId) -> Result<(), SequenceError> {
self.ensure_sync_mutation_allowed()?;
if !self.request_to_worker.contains_key(request_id) {
tracing::debug!("Request {request_id} not found, already freed (idempotent)");
return Ok(());
}
self.mutate_request_worker_local(
request_id,
|seqs, rid| {
seqs.free(rid);
},
true,
)
}
/// Mark prefill as completed for a request. /// Mark prefill as completed for a request.
/// ///
/// Note: Calling this multiple times for the same request is allowed and will be a no-op /// Note: Calling this multiple times for the same request is allowed and will be a no-op
...@@ -527,6 +578,17 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> { ...@@ -527,6 +578,17 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
.await .await
} }
pub fn mark_prefill_completed_sync(&self, request_id: &RequestId) -> Result<(), SequenceError> {
self.ensure_sync_mutation_allowed()?;
self.mutate_request_worker_local(
request_id,
|seqs, rid| {
seqs.mark_prefill_completed(rid);
},
false,
)
}
/// Add an output block with optional fractional decay weight. /// Add an output block with optional fractional decay weight.
/// ///
/// This is used during generation to track output blocks as they are created. /// This is used during generation to track output blocks as they are created.
......
...@@ -40,6 +40,7 @@ use dynamo_llm::model_card::ModelDeploymentCard; ...@@ -40,6 +40,7 @@ use dynamo_llm::model_card::ModelDeploymentCard;
use dynamo_llm::preprocessor::prompt::{ use dynamo_llm::preprocessor::prompt::{
ChatTemplate, ContextMixins, OAIChatLikeRequest, PromptFormatter, ChatTemplate, ContextMixins, OAIChatLikeRequest, PromptFormatter,
}; };
use dynamo_mocker::loadgen::RouterSequence;
/// KV Router event subject suffix (appended to Component.subject()) /// KV Router event subject suffix (appended to Component.subject())
/// Full subject format: namespace.{namespace}.component.{component}.kv-events /// Full subject format: namespace.{namespace}.component.{component}.kv-events
...@@ -532,41 +533,34 @@ impl PrefixData { ...@@ -532,41 +533,34 @@ impl PrefixData {
} }
/// Pre-generated sequence data for benchmarking /// Pre-generated sequence data for benchmarking
#[derive(Clone)] type SequenceData = RouterSequence;
struct SequenceData {
worker_id: WorkerId,
local_hashes: Vec<LocalBlockHash>,
external_hashes: Vec<ExternalSequenceBlockHash>,
}
impl SequenceData { fn sequence_from_request_content(
/// Create a sequence from the exact request content.
fn from_request_content(
content: &str, content: &str,
worker_id: WorkerId, worker_id: WorkerId,
kv_block_size: u32, kv_block_size: u32,
tokenizer: &Tokenizer, tokenizer: &Tokenizer,
prompt_renderer: Option<&PromptRenderer>, prompt_renderer: Option<&PromptRenderer>,
) -> Result<Self> { ) -> Result<SequenceData> {
let (local_hashes, external_hashes) = let (local_hashes, external_hashes) =
compute_hashes_for_content(content, tokenizer, kv_block_size, prompt_renderer)?; compute_hashes_for_content(content, tokenizer, kv_block_size, prompt_renderer)?;
Ok(Self { Ok(SequenceData {
worker_id, worker_id,
local_hashes, local_hashes,
external_hashes, external_hashes,
}) })
} }
fn to_router_event(&self, event_id: u64) -> RouterEvent { fn sequence_to_router_event(sequence: &SequenceData, event_id: u64) -> RouterEvent {
let kv_event = KvCacheEvent { let kv_event = KvCacheEvent {
event_id, event_id,
data: KvCacheEventData::Stored(KvCacheStoreData { data: KvCacheEventData::Stored(KvCacheStoreData {
parent_hash: None, parent_hash: None,
blocks: self blocks: sequence
.local_hashes .local_hashes
.iter() .iter()
.zip(self.external_hashes.iter()) .zip(sequence.external_hashes.iter())
.map(|(local, ext)| KvCacheStoredBlockData { .map(|(local, ext)| KvCacheStoredBlockData {
block_hash: *ext, block_hash: *ext,
tokens_hash: *local, tokens_hash: *local,
...@@ -576,8 +570,7 @@ impl SequenceData { ...@@ -576,8 +570,7 @@ impl SequenceData {
}), }),
dp_rank: 0, dp_rank: 0,
}; };
RouterEvent::new(self.worker_id, kv_event) RouterEvent::new(sequence.worker_id, kv_event)
}
} }
/// Response from the frontend's /health endpoint /// Response from the frontend's /health endpoint
...@@ -692,7 +685,7 @@ fn generate_sequences_for_requests( ...@@ -692,7 +685,7 @@ fn generate_sequences_for_requests(
num_prefix_prompts, num_prefix_prompts,
seed, seed,
); );
let seq = SequenceData::from_request_content( let seq = sequence_from_request_content(
&content, &content,
worker_id, worker_id,
kv_block_size, kv_block_size,
...@@ -749,7 +742,7 @@ async fn build_tree_via_nats( ...@@ -749,7 +742,7 @@ async fn build_tree_via_nats(
}; };
for (event_id, seq) in sequences.iter().enumerate() { for (event_id, seq) in sequences.iter().enumerate() {
let event = seq.to_router_event(event_id as u64); let event = sequence_to_router_event(seq, event_id as u64);
let data = encode_event_with_envelope(&event, KV_EVENT_SUBJECT)?; let data = encode_event_with_envelope(&event, KV_EVENT_SUBJECT)?;
nats_client nats_client
.publish(subject.clone(), data.into()) .publish(subject.clone(), data.into())
...@@ -1165,7 +1158,7 @@ async fn publish_events_at_rate( ...@@ -1165,7 +1158,7 @@ async fn publish_events_at_rate(
while start.elapsed() < duration { while start.elapsed() < duration {
let seq = &sequences[(event_id as usize) % sequences.len()]; let seq = &sequences[(event_id as usize) % sequences.len()];
let event = seq.to_router_event(event_id); let event = sequence_to_router_event(seq, event_id);
match encode_event_with_envelope(&event, KV_EVENT_SUBJECT) { match encode_event_with_envelope(&event, KV_EVENT_SUBJECT) {
Ok(data) => { Ok(data) => {
......
...@@ -11,5 +11,6 @@ pub mod cache; ...@@ -11,5 +11,6 @@ pub mod cache;
pub mod common; pub mod common;
pub mod engine; pub mod engine;
pub mod kv_manager; pub mod kv_manager;
pub mod loadgen;
pub mod replay; pub mod replay;
pub mod scheduler; pub mod scheduler;
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use std::cmp::Ordering;
use std::collections::{BinaryHeap, HashMap};
use anyhow::{Result, anyhow, bail};
use uuid::Uuid;
use super::types::{ReadyTurn, Trace, TurnTrace};
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum DriverMode {
Trace,
Concurrency,
}
#[derive(Debug)]
struct SessionRuntime {
session_id: String,
turns: Vec<TurnTrace>,
next_turn_index: usize,
next_ready_at_ms: Option<f64>,
in_flight: Option<Uuid>,
}
#[derive(Debug)]
struct InFlightTurn {
session_index: usize,
turn_index: usize,
}
#[derive(Debug, Clone, Copy)]
struct ReadySession {
ready_at_ms: f64,
session_index: usize,
turn_index: usize,
}
impl PartialEq for ReadySession {
fn eq(&self, other: &Self) -> bool {
self.ready_at_ms.to_bits() == other.ready_at_ms.to_bits()
&& self.session_index == other.session_index
&& self.turn_index == other.turn_index
}
}
impl Eq for ReadySession {}
impl Ord for ReadySession {
fn cmp(&self, other: &Self) -> Ordering {
other
.ready_at_ms
.total_cmp(&self.ready_at_ms)
.then_with(|| other.session_index.cmp(&self.session_index))
.then_with(|| other.turn_index.cmp(&self.turn_index))
}
}
impl PartialOrd for ReadySession {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
#[derive(Debug)]
pub struct WorkloadDriver {
mode: DriverMode,
block_size: usize,
sessions: Vec<SessionRuntime>,
in_flight: HashMap<Uuid, InFlightTurn>,
ready_sessions: BinaryHeap<ReadySession>,
}
impl WorkloadDriver {
pub(crate) fn new_trace(trace: Trace) -> Result<Self> {
Self::new(trace, DriverMode::Trace)
}
pub(crate) fn new_concurrency(trace: Trace) -> Result<Self> {
Self::new(trace, DriverMode::Concurrency)
}
fn new(trace: Trace, mode: DriverMode) -> Result<Self> {
let sessions: Vec<SessionRuntime> = trace
.sessions
.into_iter()
.map(|session| SessionRuntime {
session_id: session.session_id,
turns: session.turns,
next_turn_index: 0,
next_ready_at_ms: Some(match mode {
DriverMode::Trace => session.first_arrival_timestamp_ms.unwrap_or(0.0),
DriverMode::Concurrency => 0.0,
}),
in_flight: None,
})
.collect();
let ready_sessions = sessions
.iter()
.enumerate()
.filter_map(|(session_index, session)| {
Some(ReadySession {
ready_at_ms: session.next_ready_at_ms?,
session_index,
turn_index: session.next_turn_index,
})
})
.collect();
Ok(Self {
mode,
block_size: trace.block_size,
sessions,
in_flight: HashMap::new(),
ready_sessions,
})
}
pub fn pop_ready(&mut self, now_ms: f64, limit: usize) -> Vec<ReadyTurn> {
if limit == 0 {
return Vec::new();
}
let mut emitted = Vec::new();
while emitted.len() < limit {
let Some(ready_session) = self.ready_sessions.pop() else {
break;
};
if ready_session.ready_at_ms > now_ms {
self.ready_sessions.push(ready_session);
break;
}
let session_index = ready_session.session_index;
let session = &mut self.sessions[session_index];
if session.in_flight.is_some()
|| session.next_turn_index != ready_session.turn_index
|| session.next_ready_at_ms != Some(ready_session.ready_at_ms)
{
continue;
}
let turn_index = session.next_turn_index;
let scheduled_ready_at_ms = session
.next_ready_at_ms
.expect("ready session must have a timestamp");
let request_uuid = Uuid::new_v4();
let replay_hashes = session.turns[turn_index]
.to_replay_hashes(self.block_size)
.expect("validated trace should always synthesize replay hashes");
let arrival_timestamp_ms = match self.mode {
DriverMode::Trace => Some(scheduled_ready_at_ms),
DriverMode::Concurrency => None,
};
let request = session.turns[turn_index]
.to_direct_request(self.block_size, request_uuid, arrival_timestamp_ms)
.expect("validated trace should always synthesize into a direct request");
session.in_flight = Some(request_uuid);
session.next_ready_at_ms = None;
self.in_flight.insert(
request_uuid,
InFlightTurn {
session_index,
turn_index,
},
);
emitted.push(ReadyTurn {
request_uuid,
session_id: session.session_id.clone(),
turn_index,
scheduled_ready_at_ms,
replay_hashes: Some(replay_hashes),
request,
});
}
emitted
}
pub fn on_complete(&mut self, request_uuid: Uuid, now_ms: f64) -> Result<()> {
let in_flight = self
.in_flight
.remove(&request_uuid)
.ok_or_else(|| anyhow!("unknown workload request completion for {request_uuid}"))?;
let session = self
.sessions
.get_mut(in_flight.session_index)
.ok_or_else(|| anyhow!("unknown workload session {}", in_flight.session_index))?;
if session.in_flight != Some(request_uuid) {
bail!(
"session {} completion for {} does not match in-flight request {:?}",
session.session_id,
request_uuid,
session.in_flight
);
}
session.in_flight = None;
session.next_turn_index = in_flight.turn_index + 1;
if session.next_turn_index < session.turns.len() {
let ready_at_ms =
now_ms + session.turns[session.next_turn_index].delay_after_previous_ms;
session.next_ready_at_ms = Some(ready_at_ms);
self.ready_sessions.push(ReadySession {
ready_at_ms,
session_index: in_flight.session_index,
turn_index: session.next_turn_index,
});
} else {
session.next_ready_at_ms = None;
}
Ok(())
}
pub fn next_ready_time_ms(&mut self) -> Option<f64> {
loop {
let ready_session = *self.ready_sessions.peek()?;
let session = &self.sessions[ready_session.session_index];
if session.in_flight.is_some()
|| session.next_turn_index != ready_session.turn_index
|| session.next_ready_at_ms != Some(ready_session.ready_at_ms)
{
self.ready_sessions.pop();
continue;
}
return Some(ready_session.ready_at_ms);
}
}
pub fn is_drained(&self) -> bool {
self.in_flight.is_empty()
&& self
.sessions
.iter()
.all(|session| session.next_turn_index >= session.turns.len())
}
}
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
mod driver;
mod trace;
mod types;
pub use driver::WorkloadDriver;
pub use types::{
ArrivalSpec, DelaySpec, LengthSpec, ReadyTurn, ReplayRequestHashes, RouterSequence,
SequenceHashMode, SessionPartitionSpec, SessionTrace, SyntheticTraceSpec, Trace, TurnTrace,
};
#[cfg(test)]
mod tests;
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use dynamo_kv_router::protocols::{compute_block_hash_for_seq, compute_seq_hash_for_block};
use tempfile::NamedTempFile;
use uuid::Uuid;
use super::*;
fn write_trace(lines: &[serde_json::Value]) -> NamedTempFile {
let mut file = NamedTempFile::new().unwrap();
for line in lines {
use std::io::Write;
writeln!(file, "{}", serde_json::to_string(line).unwrap()).unwrap();
}
file
}
#[test]
fn test_from_mooncake_single_turn_preserves_fields() {
let file = write_trace(&[serde_json::json!({
"timestamp": 123.0,
"input_length": 8,
"output_length": 4,
"hash_ids": [7, 8],
})]);
let trace = Trace::from_mooncake(file.path(), 4).unwrap();
assert_eq!(trace.sessions.len(), 1);
let session = &trace.sessions[0];
assert_eq!(session.first_arrival_timestamp_ms, Some(123.0));
assert_eq!(session.turns.len(), 1);
assert_eq!(session.turns[0].input_length, 8);
assert_eq!(session.turns[0].max_output_tokens, 4);
assert_eq!(session.turns[0].hash_ids, vec![7, 8]);
}
#[test]
fn test_from_mooncake_multi_turn_uses_session_id_and_delay() {
let file = write_trace(&[
serde_json::json!({
"session_id": "a",
"timestamp": 10.0,
"input_length": 4,
"output_length": 1,
"hash_ids": [1],
}),
serde_json::json!({
"session_id": "a",
"delay": 25.0,
"input_length": 8,
"output_length": 2,
"hash_ids": [1, 2],
}),
serde_json::json!({
"session_id": "b",
"timestamp": 20.0,
"input_length": 4,
"output_length": 1,
"hash_ids": [3],
}),
]);
let trace = Trace::from_mooncake(file.path(), 4).unwrap();
assert_eq!(trace.sessions.len(), 2);
assert_eq!(trace.sessions[0].session_id, "a");
assert_eq!(trace.sessions[0].turns.len(), 2);
assert_eq!(trace.sessions[0].turns[1].delay_after_previous_ms, 25.0);
assert_eq!(trace.sessions[1].session_id, "b");
}
#[test]
fn test_from_mooncake_defaults_missing_input_length_from_hash_capacity() {
let file = write_trace(&[serde_json::json!({
"timestamp": 7.0,
"output_length": 3,
"hash_ids": [5, 6],
})]);
let trace = Trace::from_mooncake(file.path(), 4).unwrap();
assert_eq!(trace.sessions.len(), 1);
assert_eq!(trace.sessions[0].turns[0].input_length, 8);
}
#[test]
fn test_turn_to_direct_request_repeats_hash_ids_by_block_size() {
let turn = TurnTrace {
input_length: 6,
max_output_tokens: 3,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
};
let request = turn
.to_direct_request(4, Uuid::from_u128(1), Some(5.0))
.unwrap();
assert_eq!(request.tokens, vec![1, 1, 1, 1, 2, 2]);
assert_eq!(request.arrival_timestamp_ms, Some(5.0));
}
#[test]
fn test_turn_replay_hashes_match_full_blocks_only() {
let turn = TurnTrace {
input_length: 6,
max_output_tokens: 3,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
};
let request = turn
.to_direct_request(4, Uuid::from_u128(1), Some(5.0))
.unwrap();
let replay_hashes = turn.to_replay_hashes(4).unwrap();
let expected_local = compute_block_hash_for_seq(&request.tokens, 4, None, None);
assert_eq!(replay_hashes.local_block_hashes, expected_local);
assert_eq!(
replay_hashes.sequence_hashes,
compute_seq_hash_for_block(&expected_local)
);
assert_eq!(replay_hashes.local_block_hashes.len(), 1);
}
#[test]
fn test_partition_by_session_round_robin_keeps_sessions_intact() {
let trace = Trace::synthetic(SyntheticTraceSpec {
block_size: 4,
num_sessions: 4,
turns_per_session: 2,
input_tokens: LengthSpec {
mean: 8,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: 2,
stddev: 0.0,
},
shared_prefix_ratio: 0.5,
num_prefix_groups: 2,
first_turn_arrivals: ArrivalSpec::Burst,
inter_turn_delays: DelaySpec::ConstantMs(5.0),
seed: 7,
})
.unwrap();
let partitions =
trace.partition_by_session(SessionPartitionSpec::RoundRobin { num_partitions: 2 });
assert_eq!(partitions.len(), 2);
assert_eq!(partitions[0].sessions.len(), 2);
assert_eq!(partitions[1].sessions.len(), 2);
assert!(
partitions
.iter()
.flat_map(|partition| partition.sessions.iter())
.all(|session| session.turns.len() == 2)
);
}
#[test]
fn test_synthetic_prefix_groups_share_prefixes_within_group() {
let trace = Trace::synthetic(SyntheticTraceSpec {
block_size: 4,
num_sessions: 6,
turns_per_session: 1,
input_tokens: LengthSpec {
mean: 16,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: 2,
stddev: 0.0,
},
shared_prefix_ratio: 0.5,
num_prefix_groups: 2,
first_turn_arrivals: ArrivalSpec::Burst,
inter_turn_delays: DelaySpec::None,
seed: 42,
})
.unwrap();
let prefix_len = 2;
let prefixes = trace
.sessions
.iter()
.map(|session| session.turns[0].hash_ids[..prefix_len].to_vec())
.collect::<Vec<_>>();
assert!(prefixes.windows(2).any(|window| window[0] == window[1]));
}
#[test]
fn test_expand_hash_prefix_depth_scales_hashes_and_input_length() {
let trace = Trace {
block_size: 4,
sessions: vec![SessionTrace {
session_id: "session".to_string(),
first_arrival_timestamp_ms: Some(10.0),
turns: vec![TurnTrace {
input_length: 6,
max_output_tokens: 2,
hash_ids: vec![7, 8],
delay_after_previous_ms: 0.0,
}],
}],
}
.expand_hash_prefix_depth(3);
let turn = &trace.sessions[0].turns[0];
assert_eq!(turn.input_length, 18);
assert_eq!(turn.hash_ids, vec![21, 22, 23, 24, 25, 26]);
let request = turn
.to_direct_request(trace.block_size, Uuid::from_u128(2), Some(10.0))
.unwrap();
assert_eq!(request.tokens.len(), 18);
}
#[test]
fn test_rescale_ready_span_scales_session_starts_and_inter_turn_delays() {
let trace = Trace {
block_size: 4,
sessions: vec![
SessionTrace {
session_id: "a".to_string(),
first_arrival_timestamp_ms: Some(10.0),
turns: vec![
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![1],
delay_after_previous_ms: 0.0,
},
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![2],
delay_after_previous_ms: 20.0,
},
],
},
SessionTrace {
session_id: "b".to_string(),
first_arrival_timestamp_ms: Some(30.0),
turns: vec![TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![3],
delay_after_previous_ms: 0.0,
}],
},
],
}
.rescale_ready_span(100)
.unwrap();
assert_eq!(trace.sessions[0].first_arrival_timestamp_ms, Some(0.0));
assert_eq!(trace.sessions[1].first_arrival_timestamp_ms, Some(100.0));
assert_eq!(trace.sessions[0].turns[1].delay_after_previous_ms, 100.0);
}
#[test]
fn test_driver_requires_completion_before_follow_up_turn() {
let trace = Trace {
block_size: 4,
sessions: vec![SessionTrace {
session_id: "s".to_string(),
first_arrival_timestamp_ms: Some(0.0),
turns: vec![
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![1],
delay_after_previous_ms: 0.0,
},
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![2],
delay_after_previous_ms: 10.0,
},
],
}],
};
let mut driver = trace.into_trace_driver().unwrap();
let first = driver.pop_ready(0.0, 1);
assert_eq!(first.len(), 1);
assert!(driver.pop_ready(100.0, 1).is_empty());
driver.on_complete(first[0].request_uuid, 5.0).unwrap();
assert!(driver.pop_ready(14.0, 1).is_empty());
let second = driver.pop_ready(15.0, 1);
assert_eq!(second.len(), 1);
assert_eq!(second[0].turn_index, 1);
}
#[test]
fn test_driver_next_ready_time_tracks_earliest_pending_turn() {
let trace = Trace {
block_size: 4,
sessions: vec![
SessionTrace {
session_id: "a".to_string(),
first_arrival_timestamp_ms: Some(10.0),
turns: vec![
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![1],
delay_after_previous_ms: 0.0,
},
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![2],
delay_after_previous_ms: 5.0,
},
],
},
SessionTrace {
session_id: "b".to_string(),
first_arrival_timestamp_ms: Some(20.0),
turns: vec![TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![3],
delay_after_previous_ms: 0.0,
}],
},
],
};
let mut driver = trace.into_trace_driver().unwrap();
assert_eq!(driver.next_ready_time_ms(), Some(10.0));
let first = driver.pop_ready(10.0, 1);
assert_eq!(first.len(), 1);
assert_eq!(driver.next_ready_time_ms(), Some(20.0));
driver.on_complete(first[0].request_uuid, 25.0).unwrap();
assert_eq!(driver.next_ready_time_ms(), Some(20.0));
let second = driver.pop_ready(20.0, 1);
assert_eq!(second.len(), 1);
assert_eq!(driver.next_ready_time_ms(), Some(30.0));
}
#[test]
fn test_trace_driver_round_trips_turn_semantics_into_ready_requests() {
let trace = Trace {
block_size: 2,
sessions: vec![
SessionTrace {
session_id: "session-a".to_string(),
first_arrival_timestamp_ms: Some(10.0),
turns: vec![
TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
},
TurnTrace {
input_length: 2,
max_output_tokens: 3,
hash_ids: vec![3],
delay_after_previous_ms: 5.0,
},
],
},
SessionTrace {
session_id: "session-b".to_string(),
first_arrival_timestamp_ms: Some(12.0),
turns: vec![TurnTrace {
input_length: 2,
max_output_tokens: 1,
hash_ids: vec![4],
delay_after_previous_ms: 0.0,
}],
},
],
};
let expected = trace.clone();
let mut driver = trace.into_trace_driver().unwrap();
assert!(driver.pop_ready(9.0, usize::MAX).is_empty());
let first = driver.pop_ready(10.0, usize::MAX);
assert_eq!(first.len(), 1);
let first = &first[0];
assert_eq!(first.session_id, "session-a");
assert_eq!(first.turn_index, 0);
assert_eq!(first.scheduled_ready_at_ms, 10.0);
assert_eq!(
first.request.tokens.len(),
expected.sessions[0].turns[0].input_length
);
assert_eq!(
first.request.max_output_tokens,
expected.sessions[0].turns[0].max_output_tokens
);
assert_eq!(first.request.arrival_timestamp_ms, Some(10.0));
assert_eq!(
first.replay_hashes.as_ref(),
Some(
&expected.sessions[0].turns[0]
.to_replay_hashes(expected.block_size)
.unwrap()
)
);
let expected_first_request = expected.sessions[0].turns[0]
.to_direct_request(expected.block_size, first.request_uuid, Some(10.0))
.unwrap();
assert_eq!(first.request.tokens, expected_first_request.tokens);
assert_eq!(
first.request.max_output_tokens,
expected_first_request.max_output_tokens
);
assert_eq!(first.request.uuid, expected_first_request.uuid);
assert_eq!(
first.request.arrival_timestamp_ms,
expected_first_request.arrival_timestamp_ms
);
let second = driver.pop_ready(12.0, usize::MAX);
assert_eq!(second.len(), 1);
let second = &second[0];
assert_eq!(second.session_id, "session-b");
assert_eq!(second.turn_index, 0);
assert_eq!(second.scheduled_ready_at_ms, 12.0);
assert_eq!(
second.request.tokens.len(),
expected.sessions[1].turns[0].input_length
);
assert_eq!(
second.request.max_output_tokens,
expected.sessions[1].turns[0].max_output_tokens
);
assert_eq!(second.request.arrival_timestamp_ms, Some(12.0));
assert_eq!(
second.replay_hashes.as_ref(),
Some(
&expected.sessions[1].turns[0]
.to_replay_hashes(expected.block_size)
.unwrap()
)
);
driver.on_complete(first.request_uuid, 20.0).unwrap();
assert!(driver.pop_ready(24.0, usize::MAX).is_empty());
let third = driver.pop_ready(25.0, usize::MAX);
assert_eq!(third.len(), 1);
let third = &third[0];
assert_eq!(third.session_id, "session-a");
assert_eq!(third.turn_index, 1);
assert_eq!(third.scheduled_ready_at_ms, 25.0);
assert_eq!(
third.request.tokens.len(),
expected.sessions[0].turns[1].input_length
);
assert_eq!(
third.request.max_output_tokens,
expected.sessions[0].turns[1].max_output_tokens
);
assert_eq!(third.request.arrival_timestamp_ms, Some(25.0));
assert_eq!(
third.replay_hashes.as_ref(),
Some(
&expected.sessions[0].turns[1]
.to_replay_hashes(expected.block_size)
.unwrap()
)
);
let expected_third_request = expected.sessions[0].turns[1]
.to_direct_request(expected.block_size, third.request_uuid, Some(25.0))
.unwrap();
assert_eq!(third.request.tokens, expected_third_request.tokens);
assert_eq!(
third.request.max_output_tokens,
expected_third_request.max_output_tokens
);
assert_eq!(third.request.uuid, expected_third_request.uuid);
assert_eq!(
third.request.arrival_timestamp_ms,
expected_third_request.arrival_timestamp_ms
);
}
This diff is collapsed.
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use dynamo_kv_router::LocalBlockHash;
use dynamo_kv_router::protocols::{ExternalSequenceBlockHash, WorkerId};
use dynamo_tokens::SequenceHash;
use uuid::Uuid;
use crate::common::protocols::DirectRequest;
#[derive(Debug, Clone)]
pub struct Trace {
pub block_size: usize,
pub sessions: Vec<SessionTrace>,
}
#[derive(Debug, Clone)]
pub struct SessionTrace {
pub session_id: String,
pub first_arrival_timestamp_ms: Option<f64>,
pub turns: Vec<TurnTrace>,
}
#[derive(Debug, Clone)]
pub struct TurnTrace {
pub input_length: usize,
pub max_output_tokens: usize,
pub hash_ids: Vec<u64>,
pub delay_after_previous_ms: f64,
}
#[derive(Debug, Clone)]
pub struct LengthSpec {
pub mean: usize,
pub stddev: f64,
}
#[derive(Debug, Clone)]
pub enum ArrivalSpec {
Burst,
ConstantQps { qps: f64 },
PoissonQps { qps: f64 },
GammaQps { qps: f64, smoothness: f64 },
}
#[derive(Debug, Clone)]
pub enum DelaySpec {
None,
ConstantMs(f64),
ExponentialMs { mean_ms: f64 },
}
#[derive(Debug, Clone)]
pub struct SyntheticTraceSpec {
pub block_size: usize,
pub num_sessions: usize,
pub turns_per_session: usize,
pub input_tokens: LengthSpec,
pub output_tokens: LengthSpec,
pub shared_prefix_ratio: f64,
pub num_prefix_groups: usize,
pub first_turn_arrivals: ArrivalSpec,
pub inter_turn_delays: DelaySpec,
pub seed: u64,
}
#[derive(Debug, Clone, Copy)]
pub enum SequenceHashMode {
Raw,
Cumulative,
}
#[derive(Debug, Clone, Copy)]
pub enum SessionPartitionSpec {
Random { num_partitions: usize, seed: u64 },
RoundRobin { num_partitions: usize },
}
#[derive(Debug, Clone)]
pub struct RouterSequence {
pub worker_id: WorkerId,
pub local_hashes: Vec<LocalBlockHash>,
pub external_hashes: Vec<ExternalSequenceBlockHash>,
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ReplayRequestHashes {
pub local_block_hashes: Vec<LocalBlockHash>,
pub sequence_hashes: Vec<SequenceHash>,
}
#[derive(Debug, Clone)]
pub struct ReadyTurn {
pub request_uuid: Uuid,
pub session_id: String,
pub turn_index: usize,
pub scheduled_ready_at_ms: f64,
pub replay_hashes: Option<ReplayRequestHashes>,
pub request: DirectRequest,
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment