# OmniContext
As part of OmniGen2, we introduce a new benchmark for in-context generation, **OmniContext**, which aims to provide a more comprehensive evaluation of models' in-context generation abilities. It incorporates a diverse set of input images and instructions, and utilizes GPT-4.1 for interpretable, metric-driven assessment.
Overview of OmniContext benchmark.
An illustrative evaluation case in the OmniContext benchmark.
The evaluation of the OmniContext benchmark includes the following steps:
## Step1 Environment Setup
```bash
# 1. Activate Python environment
conda activate omnigen2
# 2. Install dependencies
pip install -U datasets megfile
```
## Step2 Generate Images
Note: we fix the resolution of the output images at 1024 × 1024 to ensure that the settings are consistent across different models.
You may try generating results using OmniGen2 or other models; please ensure that the output image directory structure and format are consistent with the format specified below.
```
results/
├── {method_name}/
│ └── fullset/
│ └── {task_type}/
│ ├── key1.png
│ ├── key2.png
│ └── ...
```
To use OmniGen2, you can run the following script to generate images:
```bash
cd OmniGen2
accelerate launch --num_processes=8 -m omnicontext.inference \
--model_path "OmniGen2/OmniGen2" \
--model_name "OmniGen2" \
--test_data "OmniGen2/OmniContext" \
--result_dir "omnicontext/results" \
--num_inference_step 50 \
--height 1024 \
--width 1024 \
--text_guidance_scale 5.0 \
--image_guidance_scale 2.0 \
--num_images_per_prompt 1 \
--disable_align_res # Align the resolution to the original image when dealing image editing tasks, disable it when dealing in context generation tasks.
```
## Step3 Evaluation
1. We use GPT-4.1 to evaluate the quality of the generated images. Please make sure to set up your API key before running the script.
```bash
cd OmniGen2
openai_key=""
python -m omnicontext.test_omnicontext_score \
--test_data "OmniGen2/OmniContext" \
--result_dir "omnicontext/results" \
--model_name "OmniGen2" \
--openai_key ${openai_key} \
--max_workers 100
```
2. Next, calculate the final score:
```bash
python -m omnicontext.calculate_statistics \
--save_path "omnicontext/results" \
--model_name "OmniGen2" \
--backbone gpt4dot1
```
## Acknowledgements
The code structure of this benchmark is inspired by [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit).
Special thanks to the original project for their valuable contribution.