README.md 5.34 KB
Newer Older
Zhekai Zhang's avatar
Zhekai Zhang committed
1
2
3
4
# Nunchaku INT4 FLUX.1 Models

## Text-to-Image Gradio Demo

5
![demo](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/app/flux.1/t2i/assets/demo.jpg)
6
7
8
9
10

This interactive Gradio application can generate an image based on your provided text prompt. The base model can be either [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) or [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev).

To launch the application, simply run:

Zhekai Zhang's avatar
Zhekai Zhang committed
11
12
13
14
```shell
python run_gradio.py
```

15
16
17
18
- The demo also defaults to the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.
- By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use `--no-safety-checker`.
- To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
- By default, only the INT4 DiT is loaded. Use `-p int4 bf16` to add a BF16 DiT for side-by-side comparison, or `-p bf16` to load only the BF16 model.
Zhekai Zhang's avatar
Zhekai Zhang committed
19
20
21
22
23
24
25
26
27

## Command Line Inference

We provide a script, [generate.py](generate.py), that generates an image from a text prompt directly from the command line, similar to the demo. Simply run:

```shell
python generate.py --prompt "You Text Prompt"
```

28
- The generated image will be saved as `output.png` by default. You can specify a different path using the `-o` or `--output-path` options.
Zhekai Zhang's avatar
Zhekai Zhang committed
29

30
31
32
33
34
35
36
37
38
- The script defaults to using the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.

- By default, the script uses our INT4 model. To use the BF16 model instead, specify `-p bf16`.

- You can specify `--use-qencoder` to use our W4A16 text encoder.

- You can adjust the number of inference steps and guidance scale with `-t` and `-g`, respectively. For the FLUX.1-schnell model, the defaults are 4 steps and a guidance scale of 0; for the FLUX.1-dev model, the defaults are 50 steps and a guidance scale of 3.5.

- When using the FLUX.1-dev model, you also have the option to load a LoRA adapter with `--lora-name`. Available choices are `None`, [`Anime`](https://huggingface.co/alvdansen/sonny-anime-fixed), [`GHIBSKY Illustration`](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration), [`Realism`](https://huggingface.co/XLabs-AI/flux-RealismLora), [`Children Sketch`](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch), and [`Yarn Art`](https://huggingface.co/linoyts/yarn_art_Flux_LoRA), with the default set to `None`. You can also specify the LoRA weight with `--lora-weight`, which defaults to 1.
Zhekai Zhang's avatar
Zhekai Zhang committed
39
40
41
42
43
44
45
46
47

## Latency Benchmark

To measure the latency of our INT4 models, use the following command:

```shell
python latency.py
```

48
49
- The script defaults to the INT4 FLUX.1-schnell model. To switch to FLUX.1-dev, use the `-m dev` option. For BF16 precision, add `-p bf16`.
- Adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively.
Zhekai Zhang's avatar
Zhekai Zhang committed
50
51
  - For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
  - For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
52
53
- By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the `--mode step` flag.
- Specify the number of warmup and test runs using `--warmup-times` and `--test-times`. The defaults are 2 warmup runs and 10 test runs.
Zhekai Zhang's avatar
Zhekai Zhang committed
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

## Quality Results

Below are the steps to reproduce the quality metrics reported in our paper. Firstly, you will need to install slightly more packages for the image quality metrics:

```shell
pip install clean-fid torchmetrics image-reward clip datasets
```

Then generate images using both the original BF16 model and our INT4 model on the [MJHQ](https://huggingface.co/datasets/playgroundai/MJHQ-30K) and [DCI](https://github.com/facebookresearch/DCI) datasets:

```shell
python evaluate.py -p int4
python evaluate.py -p bf16
```

70
71
72
- The commands above will generate images from FLUX.1-schnell on both datasets. Use `-m dev` to switch to FLUX.1-dev, or specify a single dataset with `-d MJHQ` or `-d DCI`.
- By default, generated images are saved to `results/$MODEL/$PRECISION`. Customize the output path using the `-o` option if desired.
- You can also adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively.
Zhekai Zhang's avatar
Zhekai Zhang committed
73
74
  - For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
  - For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
75
- To accelerate the generation process, you can distribute the workload across multiple GPUs. For instance, if you have $N$ GPUs, on GPU $i (0 \\le i < N)$ , you can add the options `--chunk-start $i --chunk-step $N`. This setup ensures each GPU handles a distinct portion of the workload, enhancing overall efficiency.
Zhekai Zhang's avatar
Zhekai Zhang committed
76
77
78
79
80
81
82
83
84
85
86
87
88
89

Finally you can compute the metrics for the images with

```shell
python get_metrics.py results/schnell/int4 results/schnell/bf16
```

Remember to replace the example paths with the actual paths to your image folders.

**Notes:**

- The script will calculate quality metrics (CLIP IQA, CLIP Score, Image Reward, FID) only for the first folder specified. Ensure the INT4 results folder is listed first.
- **Similarity Metrics**: If a second folder path is not provided, similarity metrics (LPIPS, PSNR, SSIM) will be skipped.
- **Output File**: Metric results are saved in `metrics.json` by default. Use `-o` to specify a custom output file if needed.