# Multimodal JSONL Request Generator Generates `.jsonl` benchmark files for [aiperf](https://github.com/NVIDIA/aiperf) with single-turn multimodal requests (text + images). ## Key concept: image pool reuse Each request samples images from a fixed pool. A smaller pool relative to total image slots produces more cross-request image reuse — useful for benchmarking embedding cache hit rates. For example, 500 requests x 3 images each = 1500 image slots. With `--images-pool 200`, many requests will share the same images. ## Image modes | Mode | `--image-mode` | What goes in the JSONL | Who fetches the image | |------|---------------|------------------------|----------------------| | base64 (default) | `base64` | Absolute file paths to local PNGs | aiperf reads and base64-encodes before sending | | HTTP | `http` | COCO test2017 URLs | The LLM server downloads images itself | For `http` mode, download COCO annotations first: ```bash mkdir -p annotations && cd annotations wget http://images.cocodataset.org/annotations/image_info_test2017.zip unzip image_info_test2017.zip ``` ## Usage ```bash # Defaults: 500 requests, 3 images each, all unique, base64 mode python main.py # HTTP mode with COCO URLs python main.py --image-mode http # Control reuse: 200 requests, pool of 100 unique images python main.py -n 200 --images-pool 100 # More images per request python main.py -n 100 --images-per-request 20 --images-pool 500 ``` Output filename encodes the parameters, e.g. `500req_3img_200pool_300word_http.jsonl`. ## Running with aiperf ```bash aiperf profile \ --model Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 \ --input-file 500req_3img_200pool_300word_http.jsonl \ --custom-dataset-type single_turn \ --shared-system-prompt-length 1000 \ --extra-inputs "max_tokens:500" \ --extra-inputs "min_tokens:500" \ --extra-inputs "ignore_eos:true" ``` Note: the JSONL contains actual content (text + image references), not token counts. Do not pass `--isl` — it only applies to synthetic data generation.