Merge pull request #315 from mit-han-lab/dev

fix: support for transformer block indices > 19 in forward_layer (FluxModel.cpp) docs: add the contribution and test guidance

Merge pull request #315 from mit-han-lab/dev
fix: support for transformer block indices > 19 in forward_layer (FluxModel.cpp) docs: add the contribution and test guidance
ccd93d1e · Muyang Li · GitHub · fa765637 · ffb1dff5 · ccd93d1e
Unverified Commit ccd93d1e authored Apr 24, 2025 by Muyang Li Committed by GitHub Apr 24, 2025
5 changed files
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
-<!-- Thank you for your contribution! We appreciate it. The following guidelines will help improve your pull request and facilitate feedback. If anything is unclear, don't hesitate to submit your pull request and ask the maintainers for assistance. -->
+<!-- Thank you for your contribution—we truly appreciate it! To help us review your pull request efficiently, please follow the guidelines below. If anything is unclear, feel free to open the PR and ask for clarification. You can also refer to our [contribution guide](./docs/contribution_guide.md) for more details. -->

 ## Motivation

@@ -11,8 +11,8 @@
 ## Checklist

 - [ ] Code is formatted using Pre-Commit hooks.
- [ ] Relevant unit tests are added in the [`tests`](../tests) directory.
+- [ ] Relevant unit tests are added in the [`tests`](../tests) directory following the guidance in [`tests/README.md`](../tests/README.md).
 - [ ] [README](../README.md) and example scripts in [`examples`](../examples) are updated if necessary.
- [ ]  Throughput/latency benchmarks and quality evaluations are included where applicable.
+- [ ] Throughput/latency benchmarks and quality evaluations are included where applicable.
 - [ ] **For reviewers:** If you're only helping merge the main branch and haven't contributed code to this PR, please remove yourself as a co-author when merging.
 - [ ] Please feel free to join our [Slack](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q), [Discord](https://discord.gg/Wk6PnwX9Sm) or [WeChat](https://github.com/mit-han-lab/nunchaku/blob/main/assets/wechat.jpg) to discuss your PR.
\ No newline at end of file
--- a/README.md
+++ b/README.md
@@ -145,7 +145,7 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
    conda install -c conda-forge gxx=11 gcc=11
    ```

-    For Windows users, you can download and install the lastest [Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false).
+    For Windows users, you can download and install the latest [Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false).
    
    Then build the package from source with
    
@@ -322,6 +322,13 @@ If you find `nunchaku` useful or relevant to your research, please cite our pape
 }
 ```

+## Contribution
+We warmly welcome contributions from the community! To get started, please refer to our [contribution guide](docs/contribution_guide.md) for instructions on how to contribute code to Nunchaku.
+
+## Contact Us
+
+For enterprises interested in adopting SVDQuant or Nunchaku, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at muyangli@mit.edu.
+
 ## Related Projects

 * [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
@@ -331,10 +338,6 @@ If you find `nunchaku` useful or relevant to your research, please cite our pape
 * [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
 * [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
 * [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
-
-## Contact Us
-For enterprises interested in adopting SVDQuant or Nunchaku, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at muyangli@mit.edu.
-
 ## Acknowledgments

 We thank MIT-IBM Watson AI Lab, MIT and Amazon Science Hub, MIT AI Hardware Program, National Science Foundation, Packard Foundation, Dell, LG, Hyundai, and Samsung for supporting this research. We thank NVIDIA for donating the DGX server.

--- a/docs/contribution_guide.md
+++ b/docs/contribution_guide.md
+# Contribution Guide
+
+Welcome to **Nunchaku**! We appreciate your interest in contributing. This guide outlines how to set up your environment, run tests, and submit a Pull Request (PR). Whether you're fixing a minor bug or implementing a major feature, we encourage you to follow these steps for a smooth and efficient contribution process.
+
+## 🚀 Setting Up & Building from Source
+
+### 1. Fork and Clone the Repository
+
+> 📌 **Note:** As a new contributor, you won’t have write access to the official Nunchaku repository. Please fork the repository to your own GitHub account, then clone your fork locally:
+
+```shell
+git clone https://github.com/<your_username>/nunchaku.git
+```
+
+### 2. Install Dependencies & Build
+
+To install dependencies and build the project, follow the instructions in our [README](../README.md#installation).
+
+## 🧹 Code Formatting with Pre-Commit
+
+We use [pre-commit](https://pre-commit.com/) hooks to ensure code style consistency. Please install and run it before submitting your changes:
+
+```shell
+pip install pre-commit
+pre-commit install
+pre-commit run --all-files
+```
+
+- `pre-commit run --all-files` manually triggers all checks and automatically fixes issues where possible. If it fails initially, re-run until all checks pass.
+
+* ✅ **Ensure your code passes all checks before opening a PR.**
+
+* 🚫 **Do not commit directly to the `main` branch.** Always create a feature branch (e.g., `feat/my-new-feature`), commit your changes there, and open a PR from that branch.
+
+## 🧪 Running Unit Tests & Integrating with CI
+
+Nunchaku uses `pytest` for unit testing. If you're adding a new feature, please include corresponding test cases in the [`tests`](../tests) directory.
+
+For detailed guidance on testing, refer to the [`tests/README.md`](../tests/README.md).
+
+## Acknowledgments
+
+This contribution guide is adapted from [SGLang](https://docs.sglang.ai/references/contribution_guide.html). We thank them for the inspiration.
\ No newline at end of file
--- a/src/FluxModel.cpp
+++ b/src/FluxModel.cpp
@@ -883,12 +883,22 @@ std::tuple<Tensor, Tensor> FluxModel::forward_layer(
        Tensor controlnet_block_samples,
        Tensor controlnet_single_block_samples) {

-    std::tie(hidden_states, encoder_hidden_states) = transformer_blocks.at(layer)->forward(
-        hidden_states,
-        encoder_hidden_states,
-        temb,
-        rotary_emb_img,
-        rotary_emb_context, 0.0f);
+    if (layer < transformer_blocks.size()){
+        std::tie(hidden_states, encoder_hidden_states) = transformer_blocks.at(layer)->forward(
+            hidden_states,
+            encoder_hidden_states,
+            temb,
+            rotary_emb_img,
+            rotary_emb_context, 0.0f);
+    }
+    else {
+        std::tie(hidden_states, encoder_hidden_states) = transformer_blocks.at(layer - transformer_blocks.size())->forward(
+            hidden_states,
+            encoder_hidden_states,
+            temb,
+            rotary_emb_img,
+            rotary_emb_context, 0.0f);
+    }

    const int txt_tokens = encoder_hidden_states.shape[1];
    const int img_tokens = hidden_states.shape[1];

--- a/tests/README.md
+++ b/tests/README.md
+# Nunchaku Tests
+Nunchaku uses pytest as its testing framework.
+
+## Setting Up Test Environments
+After installing `nunchaku` as described in the [README](../README.md#installation), you can install the test dependencies with:
+```shell
+pip install -r tests/requirements.txt
+```
+
+## Running the Tests
+```shell
+HF_TOKEN=$YOUR_HF_TOKEN pytest -v tests/flux/test_flux_memory.py
+HF_TOKEN=$YOUR_HF_TOKEN pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py
+HF_TOKEN=$YOUR_HF_TOKEN pytest -v tests/sana
+```
+
+> **Note:** `$YOUR_HF_TOKEN` refers to your Hugging Face access token, required to download models and datasets. You can create one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
+>  If you've already logged in using `huggingface-cli login`, you can skip setting this environment variable.
+
+Some tests generate images using the original 16-bit models. You can cache these results to speed up future test runs by setting the environment variable `NUNCHAKU_TEST_CACHE_ROOT`. If not set, the images will be saved in `test_results/ref`.
+
+## Writing Tests
+
+When adding a new feature, please include corresponding test cases in the [`tests`](./) directory. **Please avoid modifying existing tests.**
+
+To test visual output correctness, you can:
+
+1. **Generate reference images:** Use the original 16-bit model to produce a small number of reference images (e.g., 4).
+
+2. **Generate comparison images:** Run your method using the **same inputs and seeds** to ensure deterministic outputs. You can control the seed by setting the `generator` parameter in the diffusers pipeline.
+
+3. **Compute similarity:** Evaluate the similarity between your outputs and the reference images using the [LPIPS](https://arxiv.org/abs/1801.03924) metric. Use the `compute_lpips` function provided in [`tests/flux/utils.py`](flux/utils.py):
+
+   ```shell
+   lpips = compute_lpips(dir1, dir2)
+   ```
+
+   Here, `dir1` should point to the directory containing the reference images, and `dir2` should contain the images generated by your method. 
+
+### Setting the LPIPS Threshold
+
+To pass the test, the LPIPS score must be below a predefined threshold—typically **< 0.3**. We recommend first running the comparison locally to observe the LPIPS value, and then setting the threshold slightly above that value to allow for minor variations. Since the test is based on a small sample of images, slight fluctuations are expected; a margin of **+0.04** is generally sufficient.
+
+## Acknowledgments
+
+This contribution guide is adapted from [SGLang](https://github.com/sgl-project/sglang/tree/main/test). We thank them for the inspiration.
\ No newline at end of file