"vllm/v1/attention/ops/flashmla.py" did not exist on "1c0c68202cc128e740223e033273caa949c45f15"
README.md 661 Bytes
Newer Older
1
2
3
4
5
6
7
# Disaggregated Prefill V1

This example contains scripts that demonstrate disaggregated prefill in the offline setting of vLLM.

## Files

- `run.sh` - A helper script that will run `prefill_example.py` and `decode_example.py` sequentially.
8
  - Make sure you are in the `examples/offline_inference/disaggregated-prefill-v1` directory before running `run.sh`.
9
10
- `prefill_example.py` - A script which performs prefill only, saving the KV state to the `local_storage` directory and the prompts to `output.txt`.
- `decode_example.py` - A script which performs decode only, loading the KV state from the `local_storage` directory and the prompts from `output.txt`.