v1.0

109f0842 · chenzk · 109f0842 · 109f0842 · 109f0842 · 109f0842
Commit 109f0842 authored Jul 19, 2025 by chenzk
20 changed files
--- a/data_configs/train/example/t2i/jsonls/0.jsonl
+++ b/data_configs/train/example/t2i/jsonls/0.jsonl
+{"task_type": "t2i", "instruction": "A big tree is in the forest", "output_image": "/path/to/your/data/t2i/0.png"}
+{"task_type": "t2i", "instruction": "a dog is running on grass", "output_image": "/path/to/your/data/t2i/1.png"}
\ No newline at end of file
--- a/data_configs/train/example/t2i/t2i.yml
+++ b/data_configs/train/example/t2i/t2i.yml
+data:
+  - 
+    path: "data_configs/example/t2i/jsonls/0.jsonl"
+    type: "t2i"
+    ratio: !!float 1
+  - 
+    path: "data_configs/example/t2i/jsonls/1.jsonl"
+    type: "t2i"
+    ratio: !!float 1
\ No newline at end of file
--- a/doc/02.png
+++ b/doc/02.png
--- a/doc/OmniGen2.png
+++ b/doc/OmniGen2.png
--- a/doc/Reflection.png
+++ b/doc/Reflection.png
--- a/doc/replace.png
+++ b/doc/replace.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04.1-py3.10
+ENV DEBIAN_FRONTEND=noninteractive
+# RUN yum update && yum install -y git cmake wget build-essential
+# RUN source /opt/dtk25.04.1/env.sh
+# # 安装pip相关依赖
+COPY requirements.txt requirements.txt
+RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
--- a/docker/requirements.txt
+++ b/docker/requirements.txt
+# torch==2.6.0
+# torchvision==0.21.0
+timm
+einops
+accelerate
+transformers==4.51.3
+diffusers
+opencv-python-headless
+scipy
+wandb
+matplotlib
+Pillow
+tqdm
+omegaconf
+python-dotenv
+ninja
+ipykernel
+wheel
+# triton-windows; sys_platform == "win32"
--- a/docs/FINETUNE.md
+++ b/docs/FINETUNE.md
+## 🎨 Fine-Tuning OmniGen2
+You can fine-tune OmniGen2 to customize its capabilities, enhance its performance on specific tasks, or address potential limitations.
+We provide a training script that supports multi-GPU and multi-node distributed training using **PyTorch FSDP (Fully Sharded Data Parallel)**. Both full-parameter fine-tuning and **LoRA (Low-Rank Adaptation)** are supported out of the box.
+### 1. Preparation
+Before launching the training, you need to prepare the following configuration files.
+#### Step 1: Set Up the Training Configuration
+This is a YAML file that specifies crucial parameters for your training job, including the model architecture, optimizer, dataset paths, and validation settings.
+We provide two templates to get you started:
+*   **Full-Parameter Fine-Tuning:** `options/ft.yml`
+*   **LoRA Fine-Tuning:** `options/ft_lora.yml`
+Copy one of these templates and modify it according to your needs. Below are some of the most important parameters you may want to adjust:
+- `name`: The experiment name. This is used to create a directory for logs and saved model weights (e.g., `experiments/your_exp_name`).
+- `data.data_path`: Path to the data configuration file that defines your training data sources and mixing ratios.
+- `data.max_output_pixels`: The maximum number of pixels for an output image. Larger images will be downsampled while maintaining their aspect ratio.
+- `data.max_input_pixels`: A list specifying the maximum pixel count for input images, corresponding to one, two, three, or more inputs.
+- `data.max_side_length`: The maximum side length for any image (input or output). Images exceeding this will be downsampled while maintaining their aspect ratio..
+- `train.global_batch_size`: The total batch size across all GPUs. This should equal `batch_size` × `(number of GPUs)`.
+- `train.batch_size`: The batch size per GPU.
+- `train.max_train_steps`: The total number of training steps to run.
+- `train.learning_rate`: The learning rate for the optimizer. **Note:** This often requires tuning based on your dataset size and whether you are using LoRA. We recommend using lower learning rate for full-parameter fine-tuning.
+- `logger.log_with`: Specify which loggers to use for monitoring training (e.g., `tensorboard`, `wandb`).
+#### Step 2: Configure Your Dataset
+The data configuration consists of a set of `yaml` and `jsonl` files.
+*   The `.yml` file defines the mixing ratios for different data sources.
+*   The `.jsonl` files contain the actual data entries, with each line representing a single data sample.
+For a practical example, please refer to `data_configs/train/example/mix.yml`.
+Each line in a `.jsonl` file describes a sample, generally following this format:
+```json
+{
+  "task_type": "edit",
+  "instruction": "add a hat to the person",
+  "input_images": ["/path/to/your/data/edit/input1.png", "/path/to/your/data/edit/input2.png"],
+  "output_image": "/path/to/your/data/edit/output.png"
+}
+```
+*Note: The `input_images` field can be omitted for text-to-image (T2I) tasks.*
+#### Step 3: Review the Training Scripts
+We provide convenient shell scripts to handle the complexities of launching distributed training jobs. You can use them directly or adapt them for your environment.
+*   **For Full-Parameter Fine-Tuning:** `scripts/train/ft.sh`
+*   **For LoRA Fine-Tuning:** `scripts/train/ft_lora.sh`
+---
+### 2. 🚀 Launching the Training
+Once your configuration is ready, you can launch the training script. All experiment artifacts, including logs and checkpoints, will be saved in `experiments/${experiment_name}`.
+#### Multi-Node / Multi-GPU Training
+For distributed training across multiple nodes or GPUs, you need to provide environment variables to coordinate the processes.
+```shell
+# Example for full-parameter fine-tuning
+bash scripts/train/ft.sh --rank=$RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT --world_size=$WORLD_SIZE
+```
+#### Single-Node Training
+If you are training on a single machine (with one or more GPUs), you can omit the distributed arguments. The script will handle the setup automatically.
+```shell
+# Example for full-parameter fine-tuning on a single node
+bash scripts/train/ft.sh
+```
+> **⚠️ Note on LoRA Checkpoints:**
+> Currently, when training with LoRA, the script saves the entire model's parameters (including the frozen base model weights) in the checkpoint. This is due to a limitation in easily extracting only the LoRA-related parameters when using FSDP. The conversion script in the next step will correctly handle this.
+---
+### 3. 🖼️ Inference with Your Fine-Tuned Model
+After training, you must convert the FSDP-saved checkpoint (`.bin`) into the standard Hugging Face format before you can use it for inference.
+#### Step 1: Convert the Checkpoint
+We provide a conversion script that automatically handles both full-parameter and LoRA checkpoints.
+**For a full fine-tuned model:**
+```shell
+python convert_ckpt_to_hf_format.py \
+  --config_path experiments/ft/ft.yml \
+  --model_path experiments/ft/checkpoint-10/pytorch_model_fsdp.bin \
+  --save_path experiments/ft/checkpoint-10/transformer
+```
+**For a LoRA fine-tuned model:**
+```shell
+python convert_ckpt_to_hf_format.py \
+  --config_path experiments/ft_lora/ft_lora.yml \
+  --model_path experiments/ft_lora/checkpoint-10/pytorch_model_fsdp.bin \
+  --save_path experiments/ft_lora/checkpoint-10/transformer_lora
+```
+#### Step 2: Run Inference
+Now, you can run the inference script, pointing it to the path of your converted model weights.
+**Using a full fine-tuned model:**
+Pass the converted model path to the `--transformer_path` argument.
+```shell
+python inference.py \
+  --model_path "OmniGen2/OmniGen2" \
+  --num_inference_step 50 \
+  --height 1024 \
+  --width 1024 \
+  --text_guidance_scale 4.0 \
+  --instruction "A crystal ladybug on a dewy rose petal in an early morning garden, macro lens." \
+  --output_image_path outputs/output_t2i_finetuned.png \
+  --num_images_per_prompt 1 \
+  --transformer_path experiments/ft/checkpoint-10/transformer
+```
+**Using a LoRA fine-tuned model:**
+Pass the converted LoRA weights path to the `--transformer_lora_path` argument.
+```shell
+python inference.py \
+  --model_path "OmniGen2/OmniGen2" \
+  --num_inference_step 50 \
+  --height 1024 \
+  --width 1024 \
+  --text_guidance_scale 4.0 \
+  --instruction "A crystal ladybug on a dewy rose petal in an early morning garden, macro lens." \
+  --output_image_path outputs/output_t2i_lora.png \
+  --num_images_per_prompt 1 \
+  --transformer_lora_path experiments/ft_lora/checkpoint-10/transformer_lora
+```
\ No newline at end of file
--- a/example.ipynb
+++ b/example.ipynb
--- a/example_edit.sh
+++ b/example_edit.sh
+# !/bin/bash
+SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
+cd $SHELL_FOLDER
+model_path="OmniGen2/OmniGen2"
+python inference.py \
+--model_path $model_path \
+--num_inference_step 50 \
+--text_guidance_scale 5.0 \
+--image_guidance_scale 2.0 \
+--instruction "Change the background to classroom." \
+--input_image_path example_images/ComfyUI_temp_mllvz_00071_.png \
+--output_image_path outputs/output_edit.png \
+--num_images_per_prompt 1
\ No newline at end of file
--- a/example_edit_test.sh
+++ b/example_edit_test.sh
+# !/bin/bash
+SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
+cd $SHELL_FOLDER
+model_path="OmniGen2/OmniGen2"
+python inference.py \
+--model_path $model_path \
+--num_inference_step 30 \
+--text_guidance_scale 5.0 \
+--image_guidance_scale 2.0 \
+--instruction "Change the color of dress to light green." \
+--input_image_path example_images/flux5.png \
+--output_image_path outputs/prompt_guide_edit_1.png \
+--num_images_per_prompt 4 \
+--cfg_range_end 0.8 \
+--scheduler dpmsolver++
+model_path="OmniGen2/OmniGen2"
+python inference.py \
+--model_path $model_path \
+--num_inference_step 50 \
+--text_guidance_scale 5.0 \
+--image_guidance_scale 2.0 \
+--instruction "Change the color of dress to light green." \
+--input_image_path example_images/flux5.png \
+--output_image_path outputs/prompt_guide_edit_2.png \
+--num_images_per_prompt 4
\ No newline at end of file
--- a/example_edit_test_0.sh
+++ b/example_edit_test_0.sh
+# !/bin/bash
+SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
+cd $SHELL_FOLDER
+model_path="OmniGen2/OmniGen2"
+python inference.py \
+--model_path $model_path \
+--num_inference_step 50 \
+--text_guidance_scale 5.0 \
+--image_guidance_scale 2.0 \
+--instruction "Change the color of dress to light green." \
+--input_image_path example_images/flux5.png \
+--output_image_path outputs/prompt_guide_edit_0.png \
+--num_images_per_prompt 4
\ No newline at end of file
--- a/example_edit_test_1.sh
+++ b/example_edit_test_1.sh
+# !/bin/bash
+SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
+cd $SHELL_FOLDER
+model_path="OmniGen2/OmniGen2"
+python inference.py \
+--model_path $model_path \
+--num_inference_step 40 \
+--text_guidance_scale 5.0 \
+--image_guidance_scale 2.0 \
+--instruction "Change the color of dress to light green." \
+--input_image_path example_images/flux5.png \
+--output_image_path outputs/prompt_guide_edit_1.png \
+--num_images_per_prompt 4
\ No newline at end of file
--- a/example_images/000050281.jpg
+++ b/example_images/000050281.jpg
--- a/example_images/000077066.jpg
+++ b/example_images/000077066.jpg
--- a/example_images/000119733.jpg
+++ b/example_images/000119733.jpg
--- a/example_images/000365954.jpg
+++ b/example_images/000365954.jpg
--- a/example_images/000440817.jpg
+++ b/example_images/000440817.jpg
--- a/example_images/00066-10350085.png
+++ b/example_images/00066-10350085.png