description:Create a report to help us reproduce and fix the bug
description:Create a report to help us reproduce and fix the bug
title:"[Bug]"
title:"[Bug]"
labels:['Bug']
labels:['Bug']
body:
body:
-type:checkboxes
-type:checkboxes
attributes:
attributes:
label:Checklist
label:Checklist
options:
options:
-label:1. I have searched for related issues and FAQs (https://github.com/mit-han-lab/nunchaku/blob/main/docs/faq.md) but was unable to find a solution.
-label:1. I have searched for related issues and FAQs (https://github.com/mit-han-lab/nunchaku/blob/main/docs/faq.md) but was unable to find a solution.
-label:2. The issue persists in the latest version.
-label:2. The issue persists in the latest version.
-label:3. Please note that without environment information and a minimal reproducible example, it will be difficult for us to reproduce and address the issue, which may delay our response.
-label:3. Please note that without environment information and a minimal reproducible example, it will be difficult for us to reproduce and address the issue, which may delay our response.
-label:4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed.
-label:4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed.
-label:5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues.
-label:5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues.
-label:6. I will do my best to describe the issue in English.
-label:6. I will do my best to describe the issue in English.
-type:textarea
-type:textarea
attributes:
attributes:
label:Describe the Bug
label:Describe the Bug
description:Provide a clear and concise explanation of the bug you encountered.
description:Provide a clear and concise explanation of the bug you encountered.
validations:
validations:
required:true
required:true
-type:textarea
-type:textarea
attributes:
attributes:
label:Environment
label:Environment
description:|
description:|
Please include relevant environment details such as your system specifications, Python version, PyTorch version, and CUDA version.
Please include relevant environment details such as your system specifications, Python version, PyTorch version, and CUDA version.
-label:1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
-label:1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
-label:2. I will do my best to describe the issue in English.
-label:2. I will do my best to describe the issue in English.
-type:textarea
-type:textarea
attributes:
attributes:
label:Motivation
label:Motivation
description:|
description:|
A clear and concise description of the motivation of the feature.
A clear and concise description of the motivation of the feature.
validations:
validations:
required:true
required:true
-type:textarea
-type:textarea
attributes:
attributes:
label:Related resources
label:Related resources
description:|
description:|
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
@@ -12,10 +12,10 @@ To launch the application, simply run:
...
@@ -12,10 +12,10 @@ To launch the application, simply run:
python run_gradio.py
python run_gradio.py
```
```
* The demo also defaults to the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.
- The demo also defaults to the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.
* By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use `--no-safety-checker`.
- By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
- To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, only the INT4 DiT is loaded. Use `-p int4 bf16` to add a BF16 DiT for side-by-side comparison, or `-p bf16` to load only the BF16 model.
- By default, only the INT4 DiT is loaded. Use `-p int4 bf16` to add a BF16 DiT for side-by-side comparison, or `-p bf16` to load only the BF16 model.
## Command Line Inference
## Command Line Inference
...
@@ -25,13 +25,17 @@ We provide a script, [generate.py](generate.py), that generates an image from a
...
@@ -25,13 +25,17 @@ We provide a script, [generate.py](generate.py), that generates an image from a
python generate.py --prompt"You Text Prompt"
python generate.py --prompt"You Text Prompt"
```
```
* The generated image will be saved as `output.png` by default. You can specify a different path using the `-o` or `--output-path` options.
- The generated image will be saved as `output.png` by default. You can specify a different path using the `-o` or `--output-path` options.
* The script defaults to using the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.
* By default, the script uses our INT4 model. To use the BF16 model instead, specify `-p bf16`.
* You can specify `--use-qencoder` to use our W4A16 text encoder.
* You can adjust the number of inference steps and guidance scale with `-t` and `-g`, respectively. For the FLUX.1-schnell model, the defaults are 4 steps and a guidance scale of 0; for the FLUX.1-dev model, the defaults are 50 steps and a guidance scale of 3.5.
* When using the FLUX.1-dev model, you also have the option to load a LoRA adapter with `--lora-name`. Available choices are `None`, [`Anime`](https://huggingface.co/alvdansen/sonny-anime-fixed), [`GHIBSKY Illustration`](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration), [`Realism`](https://huggingface.co/XLabs-AI/flux-RealismLora), [`Children Sketch`](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch), and [`Yarn Art`](https://huggingface.co/linoyts/yarn_art_Flux_LoRA), with the default set to `None`. You can also specify the LoRA weight with `--lora-weight`, which defaults to 1.
- The script defaults to using the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.
- By default, the script uses our INT4 model. To use the BF16 model instead, specify `-p bf16`.
- You can specify `--use-qencoder` to use our W4A16 text encoder.
- You can adjust the number of inference steps and guidance scale with `-t` and `-g`, respectively. For the FLUX.1-schnell model, the defaults are 4 steps and a guidance scale of 0; for the FLUX.1-dev model, the defaults are 50 steps and a guidance scale of 3.5.
- When using the FLUX.1-dev model, you also have the option to load a LoRA adapter with `--lora-name`. Available choices are `None`, [`Anime`](https://huggingface.co/alvdansen/sonny-anime-fixed), [`GHIBSKY Illustration`](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration), [`Realism`](https://huggingface.co/XLabs-AI/flux-RealismLora), [`Children Sketch`](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch), and [`Yarn Art`](https://huggingface.co/linoyts/yarn_art_Flux_LoRA), with the default set to `None`. You can also specify the LoRA weight with `--lora-weight`, which defaults to 1.
## Latency Benchmark
## Latency Benchmark
...
@@ -41,12 +45,12 @@ To measure the latency of our INT4 models, use the following command:
...
@@ -41,12 +45,12 @@ To measure the latency of our INT4 models, use the following command:
python latency.py
python latency.py
```
```
* The script defaults to the INT4 FLUX.1-schnell model. To switch to FLUX.1-dev, use the `-m dev` option. For BF16 precision, add `-p bf16`.
- The script defaults to the INT4 FLUX.1-schnell model. To switch to FLUX.1-dev, use the `-m dev` option. For BF16 precision, add `-p bf16`.
* Adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively.
- Adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively.
- For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
- For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
- For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
- For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
* By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the `--mode step` flag.
- By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the `--mode step` flag.
* Specify the number of warmup and test runs using `--warmup-times` and `--test-times`. The defaults are 2 warmup runs and 10 test runs.
- Specify the number of warmup and test runs using `--warmup-times` and `--test-times`. The defaults are 2 warmup runs and 10 test runs.
## Quality Results
## Quality Results
...
@@ -63,12 +67,12 @@ python evaluate.py -p int4
...
@@ -63,12 +67,12 @@ python evaluate.py -p int4
python evaluate.py -p bf16
python evaluate.py -p bf16
```
```
* The commands above will generate images from FLUX.1-schnell on both datasets. Use `-m dev` to switch to FLUX.1-dev, or specify a single dataset with `-d MJHQ` or `-d DCI`.
- The commands above will generate images from FLUX.1-schnell on both datasets. Use `-m dev` to switch to FLUX.1-dev, or specify a single dataset with `-d MJHQ` or `-d DCI`.
* By default, generated images are saved to `results/$MODEL/$PRECISION`. Customize the output path using the `-o` option if desired.
- By default, generated images are saved to `results/$MODEL/$PRECISION`. Customize the output path using the `-o` option if desired.
* You can also adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively.
- You can also adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively.
- For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
- For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
- For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
- For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
* To accelerate the generation process, you can distribute the workload across multiple GPUs. For instance, if you have $N$ GPUs, on GPU $i (0 \le i < N)$ , you can add the options `--chunk-start $i --chunk-step $N`. This setup ensures each GPU handles a distinct portion of the workload, enhancing overall efficiency.
- To accelerate the generation process, you can distribute the workload across multiple GPUs. For instance, if you have $N$ GPUs, on GPU $i (0 \\le i < N)$ , you can add the options `--chunk-start $i --chunk-step $N`. This setup ensures each GPU handles a distinct portion of the workload, enhancing overall efficiency.
Finally you can compute the metrics for the images with
Finally you can compute the metrics for the images with