init

f0ab1d9a · wangwf · 1c4d8c24 · f0ab1d9a · f0ab1d9a · f0ab1d9a
Commit f0ab1d9a authored Sep 17, 2025 by wangwf
20 changed files
--- a/examples/sd2.1-images-512/6-DailyLife/theme_6_example_4_image_0.png
+++ b/examples/sd2.1-images-512/6-DailyLife/theme_6_example_4_image_0.png
--- a/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_0_image_0.png
+++ b/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_0_image_0.png
--- a/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_1_image_0.png
+++ b/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_1_image_0.png
--- a/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_2_image_0.png
+++ b/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_2_image_0.png
--- a/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_3_image_0.png
+++ b/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_3_image_0.png
--- a/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_4_image_0.png
+++ b/examples/sd2.1-images-512/7-HistoryAndRetro/theme_7_example_4_image_0.png
--- a/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_0_image_0.png
+++ b/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_0_image_0.png
--- a/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_1_image_0.png
+++ b/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_1_image_0.png
--- a/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_2_image_0.png
+++ b/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_2_image_0.png
--- a/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_3_image_0.png
+++ b/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_3_image_0.png
--- a/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_4_image_0.png
+++ b/examples/sd2.1-images-512/8-DarkAndGrotesque/theme_8_example_4_image_0.png
--- a/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_0_image_0.png
+++ b/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_0_image_0.png
--- a/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_1_image_0.png
+++ b/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_1_image_0.png
--- a/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_2_image_0.png
+++ b/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_2_image_0.png
--- a/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_3_image_0.png
+++ b/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_3_image_0.png
--- a/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_4_image_0.png
+++ b/examples/sd2.1-images-512/9-TechnologyAndDigital/theme_9_example_4_image_0.png
--- a/requirements.txt
+++ b/requirements.txt
+accelerate>=0.20.0
+diffusers>=0.34.0
+huggingface-hub<1.0,>=0.34.0
+numpy>=1.24.3,<2.0.0
+onnx
+onnxruntime>=1.22.1
+pillow
+prettytable
+tokenizers<0.22,>=0.21
+torch>=2.5.1
+transformers>=4.54.1
--- a/set_env.sh
+++ b/set_env.sh
+rocblas_lib_path=$1
+
+export MIGRAPHX_ENABLE_MIOPEN_GROUPNORM=1
+export MIGRAPHX_ENABLE_NHWC=1
+export MIGRAPHX_ENABLE_MIOPEN_CONCAT=1
+export MIGRAPHX_STABLEDIFFUSION_OPT=1
+export MIGRAPHX_ENABLE_MIOPEN_GN_LN=1
+export MIGRAPHX_ENABLE_LAYERNORM_FUSION=1
+# export PADDING_MALLOC=0  # run on KME
+
+export HIP_VISIBLE_DEVICES=6
+export LD_LIBRARY_PATH=${rocblas_lib_path}:$LD_LIBRARY_PATH
--- a/tools/README.md
+++ b/tools/README.md
+## 以通用方式加载并运行pipeline
+
+```bash
+python tools/run_pipe.py -m /path/to/sd2.1_models -p "sunflower"
+```
+
+脚本参数说明：
+
+| 参数 | 说明 | 类型 | 默认值 |
+| --- | --- | --- | --- |
+| `-m` / `--model-dir` | **必选**，pipeline 模型路径 | str | None |
+| `--force-compile` | 可选，是否强制重新编译模型 | bool | False |
+| `--num-images-per-prompt` | 可选，一条提示词一次生成图片的数量 | int | 1 |
+| `--img-size` | 可选，生成图像尺寸，如果不设置，则跟随各 pipeline 默认的图像尺寸参数 | int | None | 
+| `-p` / `--prompt` | **必选**，提示词，描述图片内容、风格、生成要求等 | str | None |
+| `-n` / `--negative-prompt` | 可选，反向提示词，例如 "ugly" | str | None |
+| `-t` / `--num-inference-steps` | 可选，生成图片时迭代多少步 | int | 50 |
+| `--seed` | 可选，随机数种子 | int | 42 |
+| `--save-prefix` | 可选，保存图片的前缀 | str | None |
+
+
+## 批量生成图片
+
+可以将多条不同主题的提示与反向提示词放在一个 json 文件中，然后使用 `tools/run_examples.py` 批量生成图片。
+
+```bash
+python tools/run_examples.py \
+-m /path/to/sd2.1_models \
+--examples-json examples/prompts_and_negative_prompts.json \
+--output-dir examples/sd2.1-images-512
+```
+
+脚本参数说明：
+
+| 参数 | 说明 | 类型 | 默认值 |
+| --- | --- | --- | --- |
+| `-m` / `--model-dir` | **必选**，sdxl 模型路径 | str | None |
+| `--force-compile` | 可选，是否强制重新编译模型 | bool | False |
+| `--num-images-per-prompt` | 可选，一条提示词一次生成图片的数量 | int | 1 |
+| `--img-size` | 可选，生成图像尺寸，如果不设置，则跟随各 pipeline 默认的图像尺寸参数 | int | None | 
+| `-t` / `--num-inference-steps` | 可选，生成图片时迭代多少步 | int | 50 |
+| `--seed` | 可选，随机数种子 | int | 42 |
+| `-examples-json` | 可选，提示词与反向提示词文件 | str | examples/prompts_and_negative_prompts.json |
+| `--output-dir` | 可选，保存生成的图片路径 | str | None |
+
+其中，提示词与反向提示词文件格式为：
+```json
+[
+    {
+      "theme": "theme0 name",
+      "examples": [
+        {
+          "prompt": "promt0 text here",
+          "negative_prompt": "negative_prompt0 text here"
+        },
+        {
+          "prompt": "promt1 text here",
+          "negative_prompt": "negative_prompt1 text here"
+        },
+        ...
+      ]
+    },
+    {
+      "theme": "theme1 name",
+      "examples": [
+        {
+          "prompt": "promt0 text here",
+          "negative_prompt": "negative_prompt0 text here"
+        },
+        {
+          "prompt": "promt1 text here",
+          "negative_prompt": "negative_prompt1 text here"
+        },
+        ...
+      ]
+    },
+    ...
+]
+```
+
+示例：[../examples/prompts_and_negative_prompts.json](../examples/prompts_and_negative_prompts.json)
+
+
+## 统计各模块耗时
+
+此项目支持通过非侵入式打点统计各模块耗时，大致的步骤如下：
+1. 创建计时器；
+2. 将要统计耗时的函数或方法注册进计时器；
+3. 开启计时器；
+4. 运行要统计耗时的函数或方法；
+5. 打印统计数据。
+
+简单使用示例：
+```python
+import random
+import time
+from migraphx_diffusers import AutoTimer
+
+def sleep_func(sleep_seconds=1):
+    time.sleep(sleep_seconds)
+
+class SleepClass:
+    def __init__(self):
+        self.min_seconds = 1
+        self.max_seconds = 5
+
+    def random_sleep(self):
+        time.sleep(random.randint(self.min_seconds, self.max_seconds))
+
+    def __call__(self, sleep_seconds=1):
+        time.sleep(sleep_seconds)
+
+obj = SleepClass()
+
+t = AutoTimer()  # step1
+
+# step2
+t.add_target(sleep_func, key="sleep_func")
+t.add_target(obj.random_sleep, key="random_sleep")
+t.add_target(obj, key="__call__")
+
+t.start_work() # step3
+
+# step4
+for i in range(10):
+    sleep_func()
+    obj()
+    if i % 3 == 0:
+        obj.random_sleep()
+
+t.summary()  # step5
+```
+
+运行结果如下：
+```
+--------------------------------------------------------------------------------------+
+|                                     Test Latency                                     |
+--------------+----------+--------------+--------------+--------------+---------------+
+|     模块     | 运行次数  | 最长耗时(ms) | 最短耗时(ms) | 平均耗时(ms) | 平均性能(fps) |
+--------------+----------+--------------+--------------+--------------+---------------+
+|  sleep_func  |    10    |   1001.06    |   1001.02    |   1001.04    |      1.0      |
+|   __call__   |    10    |   1001.07    |   1000.06    |   1000.94    |      1.0      |
+| random_sleep |    4     |    4004.1    |   1001.05    |   2252.33    |      0.44     |
+--------------+----------+--------------+--------------+--------------+---------------+
+```
+
+统计 sdxl 或 sd2.1 端到端性能与各组件性能数据：
+```bash
+python tools/time_count.py -m /path/to/sd2.1_models
+```
+
+脚本参数说明：
+
+| 参数 | 说明 | 类型 | 默认值 |
+| --- | --- | --- | --- |
+| `-m` / `--model-dir` | **必选**，sdxl 模型路径 | str | None |
+| `--force-compile` | 可选，是否强制重新编译模型 | bool | False |
+| `--num-images-per-prompt` | 可选，一条提示词一次生成图片的数量 | int | 1 |
+| `--img-size` | 可选，生成图像尺寸，如果不设置，则跟随各 pipeline 默认的图像尺寸参数 | int | None | 
+| `-t` / `--num-inference-steps` | 可选，生成图片时迭代多少步 | int | 50 |
+| `--num-warmup-loops` | 可选，warmup 迭代次数 | int | 1 |
+| `--num-count-loops` | 可选，性能统计迭代次数 | int | 100 |
+| `--out-csv-file` | 可选，性能数据保存路径，CSV文件 | str | ./perf-{date}-{time}.csv |
+
+
+## SD2.1端到端性能测试
+
+```bash
+python tools/run_sd2_1.py /path/to/sd2.1_models
+```
+
+脚本参数说明：
+
+| 参数 | 说明 | 类型 | 默认值 |
+| --- | --- | --- | --- |
+| `model-dir` | **位置参数**，sd2.1 模型路径 | str | None |
+| `--result-dir` | 可选，生成图片的存放目录 | str | ./results |
+
+测试场景如下：
+
+ batchsize: 1、2、4、8
+ image_size: 512
+ num_inference_steps: 20
+
+
+## 模型精度评估
+
+文生图任务一般采用 CLIP-score 来评估模型，首先准备数据集与多模态模型：
+```bash
+# 下载数据集
+wget https://raw.githubusercontent.com/google-research/parti/main/PartiPrompts.tsv --no-check-certificate
+
+# 下载模型
+mkdir ./openai
+huggingface-cli download openai/clip-vit-base-patch16 --local-dir ./openai/clip-vit-base-patch16 --local-dir-use-symlinks False
+```
+
+根据数据集中的提示词生成图片：
+```bash
+python tools/gen_p2_images.py -m /path/to/sd2.1_models --num-images-per-prompt 4 -p ./PartiPrompts.tsv --save-dir ./sd2.1_p2_images
+```
+
+评估生成的结果：
+```bash
+python python tools/evaluate.py -m ./openai/clip-vit-base-patch16 -d ./sd2.1_p2_images
+```
+
+精度指标如下：
+
+|     Category     |  NumPrompts |  NumImages | MeanCLIPScore (A800) | MeanCLIPScore (BW) |
+| ---------------- | ----------- | ---------- | -------------------- | ------------------ |
+|       All        |     1632    |    6528    |       33.1704        |      33.1719       |
+|     Animals      |     314     |    1256    |       34.2495        |      34.2835       |
+| Food & Beverage  |      74     |    296     |        30.624        |      30.6977       |
+|     Abstract     |      51     |    204     |       26.8288        |      27.0132       |
+|       Arts       |      66     |    264     |       36.5361        |       36.591       |
+|      People      |     177     |    708     |       32.8995        |       32.911       |
+|     Vehicles     |     104     |    416     |       32.6134        |       32.57        |
+|  Outdoor Scenes  |     131     |    524     |       32.5131        |      32.4758       |
+| World Knowledge  |     214     |    856     |       34.5557        |      34.5407       |
+|    Artifacts     |     287     |    1148    |       32.8752        |       32.847       |
+|  Indoor Scenes   |      40     |    160     |       33.4905        |      33.5314       |
+| Produce & Plants |      50     |    200     |       32.0918        |       32.069       |
+|  Illustrations   |     124     |    496     |       32.9467        |      32.8778       |
--- a/tools/evaluate.py
+++ b/tools/evaluate.py
+from collections import defaultdict
+import json
+import os
+import os.path as osp
+
+import cv2
+import numpy as np
+from prettytable import PrettyTable
+import torch
+import tqdm
+from torchmetrics.multimodal import CLIPScore
+from torchmetrics.functional.multimodal.clip_score import _clip_score_update
+
+
+class P2CLIPScore(CLIPScore):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.category2scores = defaultdict(list)
+        self.category2nprompts = defaultdict(int)
+        self.category2nimages = defaultdict(int)
+
+    def process(self, p2_images_dir):
+        prompt_dirs = []
+        for cat_dir_name in os.listdir(p2_images_dir): 
+            for prompt_dir_name in os.listdir(osp.join(p2_images_dir, cat_dir_name)):
+                prompt_dir = osp.join(p2_images_dir, cat_dir_name, prompt_dir_name)
+                prompt_dirs.append(prompt_dir)
+
+        print("Processing...")
+        for prompt_dir in tqdm.tqdm(prompt_dirs):
+            prompt_json = osp.join(prompt_dir, "prompt_info.json")
+            with open(prompt_json, "r") as f:
+                prompt_info = json.load(f)
+            category = prompt_info["category"]
+            cat_dir_name = prompt_dir.split("/")[-2]
+            assert cat_dir_name == category.replace(" ", "").replace("&", "_")
+
+            imgs = []
+            for file_name in os.listdir(prompt_dir):
+                if not file_name.endswith(".png"):
+                    continue
+                image_path = osp.join(prompt_dir, file_name)
+                img = cv2.imread(image_path)[None, ...]
+                imgs.append(img)
+            assert len(imgs) >= 1
+
+            scores, _ = _clip_score_update(
+                [prompt_info["prompt_text"]] * len(imgs),
+                torch.from_numpy(np.concatenate(imgs, 0).transpose(0, 3, 1, 2)), 
+                self.model, 
+                self.processor
+            )
+
+            # self.category2scores["All"].extend(scores.detach().numpy().tolist())
+            # self.category2scores[category].extend(scores.detach().numpy().tolist())
+            self.category2scores["All"].append(scores.max().item())
+            self.category2scores[category].append(scores.max().item())
+            self.category2nprompts["All"] += 1
+            self.category2nprompts[category] += 1
+            self.category2nimages["All"] += len(imgs)
+            self.category2nimages[category] += len(imgs)
+
+    def compute(self, output_json=None):
+        pt = PrettyTable()
+        pt.title = "Evaluation Results of PartiPrompts Dataset"
+        pt.field_names = ["Category", "Num Prompts", "Num Images", "Mean CLIP Score"]
+        for category, scores in self.category2scores.items():
+            num_prompts = self.category2nprompts[category]
+            num_images = self.category2nimages[category]
+            mean_score = sum(scores) / len(scores)
+            pt.add_row([category, num_prompts, num_images, round(mean_score, 4)])
+        print(pt)
+
+        if output_json is not None:
+            with open(output_json, "w") as f:
+                f.write(pt.get_json_string())
+
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(
+        "Evaluate text2image results of PartiPrompts dataset")
+    parser.add_argument("-m", "--model-dir", 
+                        type=str, 
+                        required=True, 
+                        help="The path to the model directory.")
+    parser.add_argument("-d", "--data-dir", 
+                        type=str, 
+                        required=True, 
+                        help="The path to the evaluation data directory.")
+    parser.add_argument("-o", "--output-json", 
+                        type=str, 
+                        default=None, 
+                        help="Output json file path.")
+    args = parser.parse_args()
+
+    p2_clip_score = P2CLIPScore(args.model_dir)
+    p2_clip_score.process(args.data_dir)
+    p2_clip_score.compute(args.output_json)
+
+
+if __name__ == "__main__":
+    main()