docs: update docs (#544)

* update docs * add docs * add api reference * fixing the links * update * docs: update the html theme * chore: clean a useless workflow

docs: update docs (#544)
* update docs * add docs * add api reference * fixing the links * update * docs: update the html theme * chore: clean a useless workflow
f082491b · Muyang Li · GitHub · 8dc0360e · 8dc0360e · f082491b
Unverified Commit f082491b authored Jul 18, 2025 by Muyang Li Committed by GitHub Jul 18, 2025
20 changed files
--- a/.github/workflows/merge-main-into-dev.yaml
+++ b/.github/workflows/merge-main-into-dev.yaml
-name: Merge main into dev
-on:
-  workflow_dispatch:
-  push:
-    branches:
-      - main
-permissions:
-  contents: write
-jobs:
-  merge-main-into-dev:
-    runs-on: ubuntu-latest
-    if: github.repository == 'mit-han-lab/nunchaku'
-    steps:
-      - name: Checkout the repository
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-          token: ${{ secrets.GH_TOKEN }}
-      - name: Check if main and dev are already in sync
-        id: check_sync
-        run: |
-          git fetch origin main dev
-          MAIN_SHA=$(git rev-parse origin/main)
-          DEV_SHA=$(git rev-parse origin/dev)
-          echo "main sha: $MAIN_SHA"
-          echo "dev sha: $DEV_SHA"
-          if [ "$MAIN_SHA" = "$DEV_SHA" ]; then
-            echo "Branches are in sync. Skipping merge."
-            echo "skip_merge=true" >> "$GITHUB_OUTPUT"
-          else
-            echo "Branches differ. Proceeding with merge."
-            echo "skip_merge=false" >> "$GITHUB_OUTPUT"
-          fi
-      - name: Merge main into dev
-        id: last_commit
-        if: steps.check_sync.outputs.skip_merge == 'false'
-        run: |
-          # Get author name and email from last commit on main
-          AUTHOR_NAME=$(git log origin/main -1 --pretty=format:'%an')
-          AUTHOR_EMAIL=$(git log origin/main -1 --pretty=format:'%ae')
-          LAST_MSG=$(git log origin/main -1 --pretty=%s)
-          echo "Author: $AUTHOR_NAME <$AUTHOR_EMAIL>"
-          echo "Last commit message: $LAST_MSG"
-          # Set Git user to last author
-          git config --global user.name "$AUTHOR_NAME"
-          git config --global user.email "$AUTHOR_EMAIL"
-          git checkout dev
-          git merge origin/main -m "[Auto Merge] $LAST_MSG"
-          git push origin dev
--- a/README.md
+++ b/README.md
 <div align="center" id="nunchaku_logo">
-  <img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img>
+  <img src="https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/nunchaku.svg" alt="logo" width="220"></img>
 </div>
 <h3 align="center">
-<a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://nunchaku.tech/docs/nunchaku/"><b>Docs</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/Nunchaku-519fed7f9de94e"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
+<a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://nunchaku.tech/docs/nunchaku/"><b>Docs</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/nunchaku-tech"><b>Hugging Face</b></a> | <a href="https://modelscope.cn/organization/nunchaku-tech"><b>ModelScope</b></a> | <a href="https://github.com/nunchaku-tech/ComfyUI-nunchaku"><b>ComfyUI</b></a>
 </h3>
 <h3 align="center">
 <a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
 </h3>
-**Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).
+**Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/nunchaku-tech/deepcompressor).
-Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q), [**Discord**](https://discord.gg/Wk6PnwX9Sm) and [**WeChat**](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/wechat.jpg) to engage in discussions with the community! More details can be found [here](https://github.com/mit-han-lab/nunchaku/issues/149). If you have any questions, run into issues, or are interested in contributing, don’t hesitate to reach out!
+Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q), [**Discord**](https://discord.gg/Wk6PnwX9Sm) and [**WeChat**](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/wechat.jpg) to engage in discussions with the community! More details can be found [here](https://github.com/nunchaku-tech/nunchaku/issues/149). If you have any questions, run into issues, or are interested in contributing, don’t hesitate to reach out!
 ## News
 - **[2025-07-13]** 🚀 The official [**Nunchaku documentation**](https://nunchaku.tech/docs/nunchaku/) is now live! Explore comprehensive guides and resources to help you get started.
 - **[2025-06-29]** 🔥 Support **FLUX.1-Kontext**! Try out our [example script](./examples/flux.1-kontext-dev.py) to see it in action! Our demo is available at this [link](https://svdquant.mit.edu/kontext/)!
- **[2025-06-01]** 🚀 **Release v0.3.0!** This update adds support for multiple-batch inference, [**ControlNet-Union-Pro 2.0**](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0), initial integration of [**PuLID**](https://github.com/ToTheBeginning/PuLID), and introduces [**Double FB Cache**](examples/flux.1-dev-double_cache.py). You can now load Nunchaku FLUX models as a single file, and our upgraded [**4-bit T5 encoder**](https://huggingface.co/mit-han-lab/nunchaku-t5) now matches **FP8 T5** in quality!
+- **[2025-06-01]** 🚀 **Release v0.3.0!** This update adds support for multiple-batch inference, [**ControlNet-Union-Pro 2.0**](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0), initial integration of [**PuLID**](https://github.com/ToTheBeginning/PuLID), and introduces [**Double FB Cache**](examples/flux.1-dev-double_cache.py). You can now load Nunchaku FLUX models as a single file, and our upgraded [**4-bit T5 encoder**](https://huggingface.co/nunchaku-tech/nunchaku-t5) now matches **FP8 T5** in quality!
 - **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
- **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
+- **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/nunchaku-tech/nunchaku/issues/266) and an [FAQ](https://github.com/nunchaku-tech/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
 - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
 <details>
@@ -32,14 +32,14 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv
 - **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!**
 - **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](examples/sana1.6b_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
 - **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**!
- **[2024-12-08]** Support [ComfyUI](https://github.com/comfyanonymous/ComfyUI). Please check [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) for the usage.
+- **[2024-12-08]** Support [ComfyUI](https://github.com/comfyanonymous/ComfyUI). Please check [ComfyUI-nunchaku](https://github.com/nunchaku-tech/ComfyUI-nunchaku) for the usage.
- **[2024-11-07]** 🔥 Our latest **W4A4** Diffusion model quantization work [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) is publicly released! Check [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) for the quantization library.
+- **[2024-11-07]** 🔥 Our latest **W4A4** Diffusion model quantization work [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) is publicly released! Check [**DeepCompressor**](https://github.com/nunchaku-tech/deepcompressor) for the quantization library.
 </details>
 ## Overview
-![teaser](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/teaser.jpg)
+![teaser](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/teaser.jpg)
 **Nunchaku** is a high-performance inference engine for low-bit neural networks. It implements **SVDQuant**, a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
 **SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models**<br>
@@ -52,30 +52,31 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
 #### Quantization Method -- SVDQuant
-![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)Overview of SVDQuant. Stage1: Originally, both the activation $\boldsymbol{X}$ and weights $\boldsymbol{W}$ contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation $\hat{\boldsymbol{X}}$ and weights $\hat{\boldsymbol{W}}$. While $\hat{\boldsymbol{X}}$ becomes easier to quantize, $\hat{\boldsymbol{W}}$ now becomes more difficult. Stage 3: SVDQuant further decomposes $\hat{\boldsymbol{W}}$ into a low-rank component $\boldsymbol{L}_1\boldsymbol{L}_2$ and a residual $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
+![intuition](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/intuition.gif)Overview of SVDQuant. Stage1: Originally, both the activation $\boldsymbol{X}$ and weights $\boldsymbol{W}$ contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation $\hat{\boldsymbol{X}}$ and weights $\hat{\boldsymbol{W}}$. While $\hat{\boldsymbol{X}}$ becomes easier to quantize, $\hat{\boldsymbol{W}}$ now becomes more difficult. Stage 3: SVDQuant further decomposes $\hat{\boldsymbol{W}}$ into a low-rank component $\boldsymbol{L}_1\boldsymbol{L}_2$ and a residual $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
 #### Nunchaku Engine Design
-![engine](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.
+![engine](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.
 ## Performance
-![efficiency](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU.
+![efficiency](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU.
 ## Getting Started
 - [Installation Guide](https://nunchaku.tech/docs/nunchaku/installation/installation.html)
 - [Usage Tutorial](https://nunchaku.tech/docs/nunchaku/usage/basic_usage.html)
- [ComfyUI Plugin: ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku)
+- [ComfyUI Plugin: ComfyUI-nunchaku](https://github.com/nunchaku-tech/ComfyUI-nunchaku)
- [Custom Model Quantization: DeepCompressor](https://github.com/mit-han-lab/deepcompressor)
+- [Custom Model Quantization: DeepCompressor](https://github.com/nunchaku-tech/deepcompressor)
- [Gradio Demo Apps](https://github.com/mit-han-lab/nunchaku/tree/main/app)
+- [Gradio Demo Apps](https://github.com/nunchaku-tech/nunchaku/tree/main/app)
 - [Reproduce SVDQuant Paper Results](app/flux.1/t2i)
+- [API Reference](https://nunchaku.tech/docs/nunchaku/python_api/nunchaku.html)
 - [Contribution Guide](https://nunchaku.tech/docs/nunchaku/developer/contribution_guide.html)
 - [Frequently Asked Questions](https://nunchaku.tech/docs/nunchaku/faq/faq.html)
 ## Roadmap
-Please check [here](https://github.com/mit-han-lab/nunchaku/issues/431) for the roadmap for the Summer.
+Please check [here](https://github.com/nunchaku-tech/nunchaku/issues/431) for the roadmap for the Summer.
 ## Contact Us
@@ -116,4 +117,4 @@ Nunchaku is also inspired by many open-source libraries, including (but not limi
 ## Star History
-[![Star History Chart](https://api.star-history.com/svg?repos=mit-han-lab/nunchaku&type=Date)](https://www.star-history.com/#mit-han-lab/nunchaku&Date)
+[![Star History Chart](https://api.star-history.com/svg?repos=nunchaku-tech/nunchaku&type=Date)](https://www.star-history.com/#nunchaku-tech/nunchaku&Date)
--- a/README_ZH.md
+++ b/README_ZH.md
 <div align="center" id="nunchaku_logo">
-  <img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img>
+  <img src="https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/nunchaku.svg" alt="logo" width="220"></img>
 </div>
 <h3 align="center">
-<a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://nunchaku.tech/docs/nunchaku/"><b>文档</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/Nunchaku-519fed7f9de94e"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
+<a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://nunchaku.tech/docs/nunchaku/"><b>文档</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/nunchaku-tech"><b>Hugging Face</b></a> | <a href="https://modelscope.cn/organization/nunchaku-tech"><b>魔搭社区</b></a> | <a href="https://github.com/nunchaku-tech/ComfyUI-nunchaku"><b>ComfyUI</b></a>
 </h3>
 <h3 align="center">
 <a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
 </h3>
-**Nunchaku**是一款专为4-bit神经网络优化的高性能推理引擎，基于我们的论文 [SVDQuant](http://arxiv.org/abs/2411.05007) 提出。底层量化库请参考 [DeepCompressor](https://github.com/mit-han-lab/deepcompressor)。
+**Nunchaku** 是一款为 4-bit 神经网络优化的高性能推理引擎，详见我们的论文 [SVDQuant](http://arxiv.org/abs/2411.05007)。底层量化库请参考 [DeepCompressor](https://github.com/nunchaku-tech/deepcompressor)。
-欢迎加入我们的用户群：[**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q)、[**Discord**](https://discord.gg/Wk6PnwX9Sm) 和 [**微信**](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/wechat.jpg)，与社区交流！更多详情请见[此处](https://github.com/mit-han-lab/nunchaku/issues/149)。如有任何问题、建议或贡献意向，欢迎随时联系！
+欢迎加入我们的用户群：[**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q)、[**Discord**](https://discord.gg/Wk6PnwX9Sm) 和 [**微信**](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/wechat.jpg)，与社区交流！更多详情见 [这里](https://github.com/nunchaku-tech/nunchaku/issues/149)。如有问题、遇到 bug 或有意贡献代码，欢迎随时联系我们！
 ## 最新动态
- **[2025-07-13]** 🚀 官方 [**Nunchaku 文档**](https://nunchaku.tech/docs/nunchaku/) 正式上线！探索全面的指南和资源，助您快速上手。
+- **[2025-07-13]** 🚀 官方 [**Nunchaku 文档**](https://nunchaku.tech/docs/nunchaku/) 上线！欢迎查阅详细的入门指南和资源。
- **[2025-06-29]** 🔥 支持 **FLUX.1-Kontext**！试用我们的 [示例脚本](./examples/flux.1-kontext-dev.py) 体验效果！演示站点请访问 [链接](https://svdquant.mit.edu/kontext/)！
+- **[2025-06-29]** 🔥 支持 **FLUX.1-Kontext**！可参考我们的[示例脚本](./examples/flux.1-kontext-dev.py)体验，在线演示见[此处](https://svdquant.mit.edu/kontext/)！
- **[2025-06-01]** 🚀 **发布 v0.3.0！** 此次更新新增多批次推理支持、[**ControlNet-Union-Pro 2.0**](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0)、[**PuLID**](https://github.com/ToTheBeginning/PuLID) 初步集成，以及 [**Double FB Cache**](examples/flux.1-dev-double_cache.py) 功能。现在您可以将 Nunchaku FLUX 模型加载为单个文件，升级的 [**4-bit T5 编码器**](https://huggingface.co/mit-han-lab/nunchaku-t5) 在质量上已与 **FP8 T5** 持平！
+- **[2025-06-01]** 🚀 **v0.3.0 发布！** 本次更新支持多 batch 推理、[**ControlNet-Union-Pro 2.0**](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0)、初步集成 [**PuLID**](https://github.com/ToTheBeginning/PuLID)，并引入 [**双 FB Cache**](examples/flux.1-dev-double_cache.py)。现已支持单文件加载 FLUX 模型，升级后的 [**4-bit T5 编码器**](https://huggingface.co/nunchaku-tech/nunchaku-t5) 质量媲美 **FP8 T5**！
- **[2025-04-16]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)和[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频，协助安装和使用。
+- **[2025-04-16]** 🎥 发布中英文[**安装与使用教程视频**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)（[**B站**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)）。
- **[2025-04-09]** 📢 发布 [四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266) 和 [常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262)，帮助社区快速上手并了解 Nunchaku 最新进展。
+- **[2025-04-09]** 📢 发布 [四月路线图](https://github.com/nunchaku-tech/nunchaku/issues/266) 及 [FAQ](https://github.com/nunchaku-tech/nunchaku/discussions/262)，助力社区快速上手并了解最新进展。
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布！** 支持 [**多LoRA**](examples/flux.1-dev-multiple-lora.py) 和 [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py)，通过 [**FP16 attention**](#fp16-attention) 和 [**First-Block Cache**](#first-block-cache) 实现更快的推理速度。新增 [**20系显卡支持**](examples/flux.1-dev-turing.py) — Nunchaku 现在更加易于使用！
+- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布！** 本次更新带来 [**多 LoRA**](examples/flux.1-dev-multiple-lora.py) 和 [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) 支持，并通过 [**FP16 attention**](#fp16-attention) 和 [**First-Block Cache**](#first-block-cache) 实现更快推理。现已兼容 [**20 系显卡**](examples/flux.1-dev-turing.py) —— Nunchaku 更易用！
 <details>
-<summary>更多动态</summary>
+<summary>更多历史</summary>
- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 发布！** 支持 [4-bit文本编码器和分层CPU offloading](#%E4%BD%8E%E6%98%BE%E5%AD%98%E6%8E%A8%E7%90%86)，FLUX最低显存需求降至 **4 GiB**，同时保持 **2–3倍加速**。此次更新还修复了分辨率、LoRA、内存锁定和运行时稳定性等问题。详情请查看发布说明！
+- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 发布！** 支持 [4-bit 文本编码器和逐层 CPU 下放](#Low-Memory-Inference)，将 FLUX 最低显存需求降至 **4 GiB**，同时实现 **2–3× 加速**。本次还修复了分辨率、LoRA、pin memory 和稳定性等问题，详见发布说明！
- **[2025-02-20]** 🚀 **NVIDIA RTX 5090 支持 NVFP4 精度！** 相比 INT4，NVFP4 画质更优，在 RTX 5090 上比 BF16 快 **约3倍**。[博客详解](https://hanlab.mit.edu/blog/svdquant-nvfp4)，[示例代码](./examples) 及 [在线演示](https://svdquant.mit.edu/flux1-schnell/) 已上线！
+- **[2025-02-20]** 🚀 **RTX 5090 支持 NVFP4 精度！** NVFP4 相比 INT4 画质更佳，在 RTX 5090 上比 BF16 快 **~3×**。详情见[博客](https://hanlab.mit.edu/blog/svdquant-nvfp4)，用法见 [`examples`](./examples)，在线体验[点此](https://svdquant.mit.edu/flux1-schnell/)！
- **[2025-02-18]** 🔥 [**自定义LoRA转换**](#%E8%87%AA%E5%AE%9A%E4%B9%89lora) 和 [**模型量化**](#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B%E9%87%8F%E5%8C%96) 指南现已发布！**[ComfyUI](./comfyui)** 工作流现在支持 **自定义LoRA** 和 **FLUX.1-Tools**！
+- **[2025-02-18]** 🔥 [**自定义 LoRA 转换**](#Customized-LoRA) 和 [**模型量化**](#Customized-Model-Quantization) 教程上线！**[ComfyUI](./comfyui)** 工作流现已支持 **自定义 LoRA** 及 **FLUX.1-Tools**！
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) 入选 ICLR 2025 Spotlight！FLUX.1-tools Gradio 演示已上线！** 使用详情请查看 [这里](#gradio-%E6%BC%94%E7%A4%BA)！我们新的 [深度图生成演示](https://svdquant.mit.edu/flux1-depth-dev/) 也已上线—快来试试吧！
+- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) 入选 ICLR 2025 Spotlight！FLUX.1-tools Gradio 演示上线！** 详情见 [这里](#gradio-demos)。全新 [depth-to-image 演示](https://svdquant.mit.edu/flux1-depth-dev/) 也已上线，欢迎体验！
- **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 来了！** 相比原模型提速 **2-3倍**。使用方法请查看 [示例](./examples)。**ComfyUI 集成即将推出！**
+- **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 发布！** 推理速度比原模型快 **2-3×**。用法见 [examples](./examples)。**ComfyUI 集成即将上线！**
- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) 支持来了！** 相比16位模型提速2-3倍。查看 [使用示例](examples/sana1.6b_pag.py) 和 [部署指南](app/sana/t2i) 了解详情。体验我们的在线演示 [svdquant.mit.edu](https://svdquant.mit.edu)！
+- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) 支持！** 推理速度比 16-bit 模型快 2-3×。用法见 [示例](examples/sana1.6b_pag.py) 和 [部署指南](app/sana/t2i)。在线体验 [svdquant.mit.edu](https://svdquant.mit.edu)！
- **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) 被 **ICLR 2025** 接收！
+- **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) 被 **ICLR 2025** 录用！
- **[2024-12-08]** 支持 [ComfyUI](https://github.com/comfyanonymous/ComfyUI)。使用方法请查看 [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku)。
+- **[2024-12-08]** 支持 [ComfyUI](https://github.com/comfyanonymous/ComfyUI)。用法见 [ComfyUI-nunchaku](https://github.com/nunchaku-tech/ComfyUI-nunchaku)。
- **[2024-11-07]** 🔥 我们最新的 **W4A4** 扩散模型量化工作 [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) 公开发布！量化库 [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) 同步发布。
+- **[2024-11-07]** 🔥 最新 **W4A4** Diffusion 量化工作 [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) 正式发布！量化库见 [**DeepCompressor**](https://github.com/nunchaku-tech/deepcompressor)。
 </details>
-## 项目概览
+## 总览
-![teaser](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/teaser.jpg)
+![teaser](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/teaser.jpg)
-**Nunchaku** 是一款专为低精度神经网络设计的高性能推理引擎。它实现了 **SVDQuant**，一种支持4-bit权重和激活的后训练量化技术，能有效保持视觉质量。在12B FLUX.1-dev模型上，相比BF16模型实现了3.6倍内存压缩。通过消除CPU offloading，在16GB笔记本RTX 4090上比16位模型快8.7倍，比NF4 W4A16基线快3倍。在PixArt-∑模型上，其视觉质量显著优于其他W4A4甚至W4A8方案。"E2E"表示包含文本编码器和VAE解码器的端到端延迟。
+**Nunchaku** 是一款面向低比特神经网络的高性能推理引擎。其实现了 **SVDQuant**，一种针对 4-bit 权重和激活的后训练量化技术，能很好地保持视觉质量。在 12B FLUX.1-dev 上，相比 BF16 模型实现了 3.6× 显存缩减。通过消除 CPU 下放，在 16GB 笔记本 4090 GPU 上比 16-bit 模型快 8.7×，比 NF4 W4A16 基线快 3×。在 PixArt-∑ 上，视觉质量显著优于其他 W4A4 甚至 W4A8 基线。"E2E" 指包括文本编码器和 VAE 解码器的端到端延迟。
-**SVDQuant: 通过低秩分量吸收异常值实现4-bit扩散模型量化**<br>
+**SVDQuant: 通过低秩分支吸收异常值，实现 4-bit Diffusion 模型**<br>
-[Muyang Li](https://lmxyy.me)\*, [Yujun Lin](https://yujunlin.com)\*, [Zhekai Zhang](https://hanlab.mit.edu/team/zhekai-zhang)\*, [Tianle Cai](https://www.tianle.website/#/), [Xiuyu Li](https://xiuyuli.com), [Junxian Guo](https://github.com/JerryGJX), [Enze Xie](https://xieenze.github.io), [Chenlin Meng](https://cs.stanford.edu/~chenlin/), [Jun-Yan Zhu](https://www.cs.cmu.edu/~junyanz/), [Song Han](https://hanlab.mit.edu/songhan) <br>
+[Muyang Li](https://lmxyy.me)\*，[Yujun Lin](https://yujunlin.com)\*，[Zhekai Zhang](https://hanlab.mit.edu/team/zhekai-zhang)\*，[Tianle Cai](https://www.tianle.website/#/)，[Xiuyu Li](https://xiuyuli.com)，[Junxian Guo](https://github.com/JerryGJX)，[Enze Xie](https://xieenze.github.io)，[Chenlin Meng](https://cs.stanford.edu/~chenlin/)，[Jun-Yan Zhu](https://www.cs.cmu.edu/~junyanz/)，[Song Han](https://hanlab.mit.edu/songhan) <br>
-*麻省理工学院、英伟达、卡内基梅隆大学、普林斯顿大学、加州大学伯克利分校、上海交通大学、pika实验室* <br>
+*MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU, Pika Labs* <br>
 https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
-## 方法原理
+## 方法
 #### 量化方法 -- SVDQuant
-![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)SVDQuant三阶段示意图。阶段1：原始激活 $\boldsymbol{X}$ 和权重 $\boldsymbol{W}$ 均含异常值，4-bit量化困难。阶段2：将激活异常值迁移至权重，得到更新的激活 $\hat{\boldsymbol{X}}$ 和权重 $\hat{\boldsymbol{W}}$。虽然 $\hat{\boldsymbol{X}}$ 更易量化，但 $\hat{\boldsymbol{W}}$ 变得更难量化。阶段3：SVDQuant 进一步通过 SVD 将 $\hat{\boldsymbol{W}}$ 分解为低秩分量 $\boldsymbol{L}_1\boldsymbol{L}_2$ 和残差 $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$。通过16位精度运行低秩分支来缓解量化难度。
+![intuition](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/intuition.gif)SVDQuant 概览。阶段1：原始激活 $\boldsymbol{X}$ 和权重 $\boldsymbol{W}$ 都包含异常值，导致 4-bit 量化困难。阶段2：我们将异常值从激活迁移到权重，得到更新后的激活 $\hat{\boldsymbol{X}}$ 和权重 $\hat{\boldsymbol{W}}$。此时 $\hat{\boldsymbol{X}}$ 更易量化，但 $\hat{\boldsymbol{W}}$ 更难。阶段3：SVDQuant 进一步将 $\hat{\boldsymbol{W}}$ 分解为低秩分支 $\boldsymbol{L}_1\boldsymbol{L}_2$ 和残差 $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$。低秩分支用 16-bit 精度运行，从而缓解量化难度。
 #### Nunchaku 引擎设计
-![engine](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/engine.jpg) (a) 原始低秩分支（秩32）因在 *下投影* 中额外读取16位输入和在 *上投影* 中额外写入16位输出而引入57%的延迟开销。Nunchaku 通过核融合优化此开销。(b) *下投影* 和 *量化* 核使用相同输入，*上投影* 和 *4-bit计算* 核共享相同输出。为减少数据搬运开销，我们将前两个核和后两个核分别融合。
+![engine](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/engine.jpg) (a) 直接用 rank 32 跑低秩分支会带来 57% 延迟开销，因需额外读写 16-bit 输入/输出。Nunchaku 通过内核融合优化此开销。(b) *Down Projection* 和 *Quantize* 内核输入相同，*Up Projection* 和 *4-Bit Compute* 内核输出相同。为减少数据搬运，Nunchaku 将前两者和后两者分别融合。
-## 性能表现
+## 性能
-![efficiency](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/efficiency.jpg)SVDQuant 将12B FLUX.1模型体积减少了3.6倍，并将16位模型的显存使用量减少了3.5倍。借助 Nunchaku，我们的 INT4 模型在桌面和笔记本 NVIDIA RTX 4090 GPU 上比 NF4 W4A16 基线快3.0倍。值得注意的是，在笔记本4090上，通过消除CPU offloading，总体加速达到了10.1倍。我们的 NVFP4 模型在 RTX 5090 GPU 上也比 BF16 和 NF4 快3.1倍。
+![efficiency](https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/efficiency.jpg)SVDQuant 将 12B FLUX.1 模型体积缩小 3.6×，显存占用降至 16-bit 模型的 1/3.5。Nunchaku 的 INT4 模型在桌面和笔记本 4090 上比 NF4 W4A16 基线快 3.0×。在笔记本 4090 上，通过消除 CPU 下放，总加速比达 10.1×。NVFP4 模型在 RTX 5090 上也比 BF16 和 NF4 快 3.1×。
-## 快速开始
+## 快速上手
 - [安装指南](https://nunchaku.tech/docs/nunchaku/installation/installation.html)
 - [使用教程](https://nunchaku.tech/docs/nunchaku/usage/basic_usage.html)
- [ComfyUI插件： ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku)
+- [ComfyUI 插件：ComfyUI-nunchaku](https://github.com/nunchaku-tech/ComfyUI-nunchaku)
- [自定义量化： DeepCompressor](https://github.com/mit-han-lab/deepcompressor)
+- [自定义模型量化：DeepCompressor](https://github.com/nunchaku-tech/deepcompressor)
- [Gradio 演示应用](https://github.com/mit-han-lab/nunchaku/tree/main/app)
+- [Gradio 演示应用](https://github.com/nunchaku-tech/nunchaku/tree/main/app)
 - [复现 SVDQuant 论文结果](app/flux.1/t2i)
+- [API 参考](https://nunchaku.tech/docs/nunchaku/python_api/nunchaku.html)
 - [贡献指南](https://nunchaku.tech/docs/nunchaku/developer/contribution_guide.html)
- [常见问题](https://nunchaku.tech/docs/nunchaku/faq/faq.html)
+- [常见问题 FAQ](https://nunchaku.tech/docs/nunchaku/faq/faq.html)
 ## 路线图
-请查看 [这里](https://github.com/mit-han-lab/nunchaku/issues/431) 获取夏季开发路线图。
+暑期开发计划见 [这里](https://github.com/nunchaku-tech/nunchaku/issues/431)。
 ## 联系我们
-有意采用 SVDQuant 或 Nunchaku 的企业，包括技术咨询、赞助机会或合作咨询，请联系我们：muyangli@mit.edu。
+如有企业合作、技术咨询、赞助或合作意向，请联系 muyangli@mit.edu。
 ## 相关项目
@@ -116,4 +117,4 @@ Nunchaku 还受到许多开源库的启发，包括（但不限于）[TensorRT-L
 ## Star 历史
-[![Star History Chart](https://api.star-history.com/svg?repos=mit-han-lab/nunchaku&type=Date)](https://www.star-history.com/#mit-han-lab/nunchaku&Date)
+[![Star History Chart](https://api.star-history.com/svg?repos=nunchaku-tech/nunchaku&type=Date)](https://www.star-history.com/#nunchaku-tech/nunchaku&Date)
--- a/docs/source/_static/nunchaku.ico
+++ b/docs/source/_static/nunchaku.ico
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -43,8 +43,12 @@ templates_path = ["_templates"]
 exclude_patterns = []
 # -- Include global link definitions -----------------------------------------
-with open(Path(__file__).parent / "links.rst", encoding="utf-8") as f:
+links_dir = Path(__file__).parent / "links"
-    rst_epilog = f.read()
+rst_epilog = ""
+if links_dir.exists() and links_dir.is_dir():
+    for link_file in sorted(links_dir.glob("*.txt")):
+        with open(link_file, encoding="utf-8") as f:
+            rst_epilog += f.read()
 # -- Options for HTML output -------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
@@ -56,6 +60,27 @@ napoleon_google_docstring = False
 napoleon_numpy_docstring = True
 extlinks = {
-    "nunchaku-issue": ("https://github.com/mit-han-lab/nunchaku/issues/%s", "nunchaku#%s"),
+    "nunchaku_issue": ("https://github.com/nunchaku-tech/nunchaku/issues/%s", "nunchaku#%s"),
-    "comfyui-issue": ("https://github.com/mit-han-lab/ComfyUI-nunchaku/issues/%s", "ComfyUI-nunchaku#%s"),
+    "comfyui-nunchaku_issue": ("https://github.com/nunchaku-tech/ComfyUI-nunchaku/issues/%s", "ComfyUI-nunchaku#%s"),
+}
+html_favicon = "_static/nunchaku.ico"
+html_theme_options = {
+    "repository_url": "https://github.com/nunchaku-tech/nunchaku",
+    "repository_branch": "main",
+    "path_to_docs": "docs/source",
+    "use_repository_button": True,
+    "use_edit_page_button": True,
+    "use_issues_button": True,
+    "use_download_button": True,
+    "logo_only": False,
+    "show_navbar_depth": 2,
+    "home_page_in_toc": True,
+    "show_toc_level": 2,
+    # "announcement": "🔥 Nunchaku v1.2 released!",
+}
+intersphinx_mapping = {
+    "comfyui_nunchaku": ("https://nunchaku.tech/docs/ComfyUI-nunchaku", None),
 }
--- a/docs/source/developer/contribution_guide.rst
+++ b/docs/source/developer/contribution_guide.rst
@@ -15,7 +15,7 @@ follow these steps for a smooth and efficient contribution process.
   .. note::
-      As a new contributor, you won't have write access to the `Nunchaku repository <nunchaku_repo_>`_.
+      As a new contributor, you won't have write access to the `Nunchaku repository <github_nunchaku_>`_.
      Please fork the repository to your own GitHub account, then clone your fork locally:
   .. code-block:: shell
@@ -29,7 +29,8 @@ follow these steps for a smooth and efficient contribution process.
 🧹 Code Formatting with Pre-Commit
 ----------------------------------
-We use `pre-commit <https://pre-commit.com/>`_ hooks to ensure code style consistency. Please install and run it before submitting your changes:
+We use `pre-commit <https://pre-commit.com/>`_ hooks to ensure code style consistency.
+Please install and run it before submitting your changes:
 .. code-block:: shell
@@ -37,7 +38,8 @@ We use `pre-commit <https://pre-commit.com/>`_ hooks to ensure code style consis
   pre-commit install
   pre-commit run --all-files
- ``pre-commit run --all-files`` manually triggers all checks and automatically fixes issues where possible. If it fails initially, re-run until all checks pass.
+- ``pre-commit run --all-files`` manually triggers all checks and automatically fixes issues where possible.
+  If it fails initially, re-run until all checks pass.
 - ✅ **Ensure your code passes all checks before opening a PR.**
@@ -63,37 +65,50 @@ Running the Tests
 .. note::
-   ``$YOUR_HF_TOKEN`` refers to your Hugging Face access token, required to download models and datasets.
+   ``$YOUR_HF_TOKEN`` refers to your Hugging Face access token,
+   required to download models and datasets.
   You can create one at https://huggingface.co/settings/tokens.
   If you've already logged in using ``huggingface-cli login``,
   you can skip setting this environment variable.
-Some tests generate images using the original 16-bit models. You can cache these results to speed up future test runs by setting the environment variable ``NUNCHAKU_TEST_CACHE_ROOT``. If not set, the images will be saved in ``test_results/ref``.
+Some tests generate images using the original 16-bit models.
+You can cache these results to speed up future test runs by setting the environment variable ``NUNCHAKU_TEST_CACHE_ROOT``. If not set, the images will be saved in ``test_results/ref``.
 Writing Tests
 ~~~~~~~~~~~~~
-When adding a new feature, please include corresponding test cases in the ``tests`` directory. **Please avoid modifying existing tests.**
+When adding a new feature,
+please include corresponding test cases in the ``tests`` directory.
+**Please avoid modifying existing tests.**
 To test visual output correctness, you can:
-1. **Generate reference images:** Use the original 16-bit model to produce a small number of reference images (e.g., 4).
+1. **Generate reference images:**
+   Use the original 16-bit model to produce a small number of reference images (e.g., 4).
-2. **Generate comparison images:** Run your method using the **same inputs and seeds** to ensure deterministic outputs. You can control the seed by setting the ``generator`` parameter in the diffusers pipeline.
+2. **Generate comparison images:**
+   Run your method using the **same inputs and seeds** to ensure deterministic outputs.
+   You can control the seed by setting the ``generator`` parameter in the diffusers pipeline.
-3. **Compute similarity:** Evaluate the similarity between your outputs and the reference images using the `LPIPS <https://arxiv.org/abs/1801.03924>`_ metric. Use the ``compute_lpips`` function provided in ``tests/flux/utils.py``:
+3. **Compute similarity:**
+   Evaluate the similarity between your outputs and the reference images
+   using the `LPIPS <https://arxiv.org/abs/1801.03924>`_ metric.
+   Use the ``compute_lpips`` function provided in `tests/flux/utils.py <https://github.com/nunchaku-tech/nunchaku/blob/main/tests/flux/utils.py>`_:
-   .. code-block:: shell
+   .. code-block:: python
      lpips = compute_lpips(dir1, dir2)
-   Here, ``dir1`` should point to the directory containing the reference images, and ``dir2`` should contain the images generated by your method.
+   - ``dir1``: Directory containing the reference images.
+   - ``dir2``: Directory containing the images generated by your method.
+**Setting the LPIPS Threshold**
+To pass the test, the LPIPS score should be **below a predefined threshold**—typically **< 0.3**.
-Setting the LPIPS Threshold
+- First, run the comparison locally to observe the LPIPS value.
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- Set the threshold slightly above your observed value to accommodate minor variations
+  (a margin of **+0.04** is generally sufficient).
+- Note that, due to the small sample size, slight fluctuations are expected.
-To pass the test, the LPIPS score must be below a predefined threshold—typically **< 0.3**.
+By following these guidelines, you help maintain the reliability and reproducibility of Nunchaku’s test suite.
-We recommend first running the comparison locally to observe the LPIPS value,
-and then setting the threshold slightly above that value to allow for minor variations.
-Since the test is based on a small sample of images, slight fluctuations are expected;
-a margin of **+0.04** is generally sufficient.
--- a/docs/source/faq/faq.rst
+++ b/docs/source/faq/faq.rst
+.. _faq_faq:
 Frequently Asked Questions (FAQ)
 ================================

--- a/docs/source/faq/model.rst
+++ b/docs/source/faq/model.rst
@@ -4,5 +4,5 @@ Model
 Which model should I choose: INT4 or FP4?
 -----------------------------------------
- For **Blackwell GPUs** (such as the RTX 50-series), we recommend using our **FP4** models for optimal compatibility and performance.
+- For **Blackwell GPUs** (such as the RTX 50-series), please use our **FP4** models for hardware compatibility.
 - For all other GPUs, please use our **INT4** models.
--- a/docs/source/faq/usage.rst
+++ b/docs/source/faq/usage.rst
@@ -7,11 +7,11 @@ Out of memory or slow model loading
 If you encounter out-of-memory errors or notice that model loading is unusually slow, please try the following steps:
 - **Update your CUDA driver** to the latest version, as this can resolve many compatibility and performance issues.
- **Set the environment variable** `NUNCHAKU_LOAD_METHOD` to either `READ` or `READNOPIN`.
+- **Set the environment variable** :envvar:`NUNCHAKU_LOAD_METHOD`  to either ``READ`` or ``READNOPIN``.
-.. note::
+.. seealso::
-   **Related issues:** :nunchaku-issue:`249`, :nunchaku-issue:`276`, :nunchaku-issue:`311`
+   **Related issues:** :nunchaku_issue:`249`, :nunchaku_issue:`276`, :nunchaku_issue:`311`
 Why do the same seeds produce slightly different images with Nunchaku?
 ----------------------------------------------------------------------
@@ -20,6 +20,6 @@ This behavior is due to minor precision noise introduced by the GPU’s accumula
 Because modern GPUs execute operations out of order for better performance, small variations in output can occur, even with the same seed.
 Enforcing strict accumulation order would reduce this variability but significantly hurt performance, so we do not plan to change this behavior.
-.. note::
+.. seealso::
-   **Related issues:** :nunchaku-issue:`229`, :nunchaku-issue:`294`
+   **Related issues:** :nunchaku_issue:`229`, :nunchaku_issue:`294`
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
 Nunchaku Documentation
 ======================
 **Nunchaku** is a high-performance inference engine optimized for low-bit diffusion models and LLMs,
-as introduced in our paper `SVDQuant <svdquant_paper_>`_.
+as introduced in our paper `SVDQuant <paper_svdquant_>`_.
-Check out `DeepCompressor <deepcompressor_repo_>`_ for the quantization library.
+Check out `DeepCompressor <github_deepcompressor_>`_ for the quantization library.
 .. toctree::
   :maxdepth: 2
@@ -36,9 +36,9 @@ Check out `DeepCompressor <deepcompressor_repo_>`_ for the quantization library.
    :caption: Useful Tools
    :titlesonly:
-    ComfyUI Plugin: ComfyUI-nunchaku <https://github.com/mit-han-lab/ComfyUI-nunchaku>
+    ComfyUI Plugin: ComfyUI-nunchaku <https://nunchaku.tech/docs/ComfyUI-nunchaku/>
-    Custom Model Quantization: DeepCompressor <https://github.com/mit-han-lab/deepcompressor>
+    Custom Model Quantization: DeepCompressor <https://github.com/nunchaku-tech/deepcompressor>
-    Gradio Demos <https://github.com/mit-han-lab/nunchaku/tree/main/app>
+    Gradio Demos <https://github.com/nunchaku-tech/nunchaku/tree/main/app>
 .. toctree::

--- a/docs/source/installation/installation.rst
+++ b/docs/source/installation/installation.rst
+.. _installation-installation:
 Installation
 ============
@@ -23,13 +25,13 @@ Installing Nunchaku
 Once PyTorch is installed, you can install ``nunchaku`` from one of the following sources:
- `GitHub Releases <nunchaku_github_releases_>`_
+- `GitHub Releases <github_nunchaku_releases_>`_
- `Hugging Face <nunchaku_huggingface_>`_
+- `Hugging Face <hf_nunchaku_>`_
- `ModelScope <nunchaku_modelscope_>`_
+- `ModelScope <ms_nunchaku_>`_
 .. code-block:: shell
-    pip install https://github.com/mit-han-lab/nunchaku/releases/download/v0.3.1/nunchaku-0.3.1+torch2.7-cp311-cp311-linux_x86_64.whl
+    pip install https://github.com/nunchaku-tech/nunchaku/releases/download/v0.3.1/nunchaku-0.3.1+torch2.7-cp311-cp311-linux_x86_64.whl
 For ComfyUI Users
 ^^^^^^^^^^^^^^^^^
@@ -37,15 +39,17 @@ For ComfyUI Users
 If you're using the **ComfyUI portable package**,
 ensure that ``nunchaku`` is installed into the Python environment bundled with ComfyUI. You can either:
- Use our **NunchakuWheelInstaller Node** in `ComfyUI-nunchaku <comfyui_nunchaku_>`_, or
+- Use our :ref:`comfyui_nunchaku:install-wheel-json` workflow, or
 - Manually install the wheel using the correct Python path.
-Option 1: Using NunchakuWheelInstaller
+Option 1: Using ``install_wheel.json`` Workflow
-""""""""""""""""""""""""""""""""""""""
+"""""""""""""""""""""""""""""""""""""""""""""""
-With `ComfyUI-nunchaku <comfyui_nunchaku_>`_ v0.3.2+, you can install Nunchaku using the provided `workflow <comfyui_nunchaku_wheel_installation_workflow_>`_ directly in ComfyUI.
+With `ComfyUI-nunchaku <github_comfyui-nunchaku_>`_ v0.3.2+,
+you can install Nunchaku using the provided
+:ref:`comfyui_nunchaku:install-wheel-json` workflow directly in ComfyUI.
-.. image:: https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/ComfyUI-nunchaku/assets/install_wheel.png
+.. image:: https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/ComfyUI-nunchaku/workflows/install_wheel.png
 Option 2: Manual Installation
 """""""""""""""""""""""""""""
@@ -69,7 +73,7 @@ To find the correct Python path:
   .. code-block:: bat
-       "G:\ComfyUI\python\python.exe" -m pip install https://github.com/mit-han-lab/nunchaku/releases/download/v0.3.1/nunchaku-0.3.1+torch2.7-cp311-cp311-linux_x86_64.whl
+       "G:\ComfyUI\python\python.exe" -m pip install https://github.com/nunchaku-tech/nunchaku/releases/download/v0.3.1/nunchaku-0.3.1+torch2.7-cp311-cp311-linux_x86_64.whl
 For Blackwell GPUs (50-series)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -96,7 +100,7 @@ Requirements
  - Linux: ``gcc/g++ >= 11``
  - Windows: Latest **MSVC** via `Visual Studio <visual_studio_>`_
-.. note::
+.. important::
   Currently supported GPU architectures:
@@ -144,7 +148,7 @@ For Windows users, download and install the latest `Visual Studio <visual_studio
 .. code-block:: shell
-    git clone https://github.com/mit-han-lab/nunchaku.git
+    git clone https://github.com/nunchaku-tech/nunchaku.git
    cd nunchaku
    git submodule init
    git submodule update

--- a/docs/source/installation/setup_windows.rst
+++ b/docs/source/installation/setup_windows.rst
@@ -7,7 +7,8 @@ Environment Setup
 1. Install Cuda
 ^^^^^^^^^^^^^^^^
-Download and install the latest CUDA Toolkit from the official `NVIDIA CUDA Downloads <nvidia_cuda_downloads_>`_. After installation, verify the installation:
+Download and install the latest CUDA Toolkit from the official `NVIDIA CUDA Downloads <download_cuda_>`_.
+After installation, verify the installation:
 .. code-block:: bat
@@ -16,7 +17,8 @@ Download and install the latest CUDA Toolkit from the official `NVIDIA CUDA Down
 2. Install Visual Studio C++ Build Tools
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Download from the official `Visual Studio Build Tools page <visual_studio_>`_. During installation, select the following workloads:
+Download from the official `Visual Studio Build Tools page <visual_studio_>`_.
+During installation, select the following workloads:
 - **Desktop development with C++**
 - **C++ tools for Linux development**
@@ -24,17 +26,18 @@ Download from the official `Visual Studio Build Tools page <visual_studio_>`_. D
 3. Install Git
 ^^^^^^^^^^^^^^
-Download Git from `https://git-scm.com/downloads/win <git_downloads_win_>`_ and follow the installation steps.
+Download Git from `this link <download_git_win_>`_ and follow the installation steps.
 4. (Optional) Install Conda
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Conda helps manage Python environments. You can install either Anaconda or Miniconda from the `official site <anaconda_download_>`_.
+Conda helps manage Python environments. You can install either Anaconda or Miniconda from the `official site <download_anaconda_>`_.
 5. (Optional) Install ComfyUI
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-You may have various ways to install ComfyUI. For example, you can use ComfyUI CLI.
+You may have various ways to install ComfyUI.
+For example, you can use ComfyUI CLI.
 Once Python is installed, you can install ComfyUI via the CLI:
 .. code-block:: bat
@@ -88,27 +91,28 @@ Install PyTorch appropriate for your setup:
 Step 3: Install Nunchaku
 ^^^^^^^^^^^^^^^^^^^^^^^^^
-Option 1: Use NunchakuWheelInstaller Node in ComfyUI
+Option 1: Use ``install_wheel.json`` Workflow in ComfyUI
-""""""""""""""""""""""""""""""""""""""""""""""""""""
+""""""""""""""""""""""""""""""""""""""""""""""""""""""""
-With `ComfyUI-nunchaku <comfyui_nunchaku_>`_  v0.3.2+, you can install Nunchaku using the provided `workflow <comfyui_nunchaku_wheel_installation_workflow_>`_ directly in ComfyUI.
+With `ComfyUI-nunchaku <github_comfyui-nunchaku_>`_  v0.3.2+,
+you can install Nunchaku using the provided :ref:`comfyui_nunchaku:install-wheel-json` workflow directly in ComfyUI.
-.. image:: https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/ComfyUI-nunchaku/assets/install_wheel.png
+.. image:: https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/ComfyUI-nunchaku/workflows/install_wheel.png
 Option 2: Manually Install Prebuilt Wheels
 """""""""""""""""""""""""""""""""""""""""""
 You can install Nunchaku wheels from one of the following:
- `Hugging Face <nunchaku_huggingface_>`_
+- `GitHub Releases <github_nunchaku_releases_>`_
- `ModelScope <nunchaku_modelscope_>`_
+- `Hugging Face <hf_nunchaku_>`_
- `GitHub Releases <nunchaku_github_releases_>`_
+- `ModelScope <ms_nunchaku_>`_
 Example (for Python 3.11 + PyTorch 2.7):
 .. code-block:: bat
-   "G:\ComfyUI\python\python.exe" -m pip install https://github.com/mit-han-lab/nunchaku/releases/download/v0.3.1/nunchaku-0.3.1+torch2.7-cp311-cp311-linux_x86_64.whl
+   "G:\ComfyUI\python\python.exe" -m pip install https://github.com/nunchaku-tech/nunchaku/releases/download/v0.3.1/nunchaku-0.3.1+torch2.7-cp311-cp311-linux_x86_64.whl
 To verify the installation:
@@ -138,7 +142,7 @@ Step 2: Clone the Repository
 .. code-block:: bat
-   git clone https://github.com/mit-han-lab/nunchaku.git
+   git clone https://github.com/nunchaku-tech/nunchaku.git
   cd nunchaku
   git submodule init
   git submodule update
@@ -192,21 +196,23 @@ Use Nunchaku in ComfyUI
 1. Install the Plugin
 ^^^^^^^^^^^^^^^^^^^^^
-Clone the `ComfyUI-nunchaku <comfyui_nunchaku_>`_ plugin into the ``custom_nodes`` folder:
+Clone the `ComfyUI-nunchaku <github_comfyui-nunchaku_>`_ plugin into the ``custom_nodes`` folder:
 .. code-block:: bat
   cd ComfyUI/custom_nodes
-   git clone https://github.com/mit-han-lab/ComfyUI-nunchaku.git
+   git clone https://github.com/nunchaku-tech/ComfyUI-nunchaku.git
-Alternatively, install it using `ComfyUI-Manager <comfyui_manager_>`_ or ``comfy-cli``.
+Alternatively, install it using `ComfyUI-Manager <github_comfyui-manager_>`_ or `comfy-cli <github_comfy-cli_>`_.
 2. Download Models
 ^^^^^^^^^^^^^^^^^^
 **Standard FLUX.1-dev Models**
-Start by downloading the standard `FLUX.1-dev text encoders <https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main>`__ and `VAE <https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors>`__. You can also optionally download the original `BF16 FLUX.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors>`__ model. An example command:
+Start by downloading the standard `FLUX.1-dev text encoders <https://huggingface.co/comfyanonymous/flux_text_encoders>`__
+and `VAE <https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors>`__.
+You can also optionally download the original `BF16 FLUX.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors>`__ model. An example command:
 .. code-block:: bat
@@ -219,12 +225,12 @@ Start by downloading the standard `FLUX.1-dev text encoders <https://huggingface
 Next, download the Nunchaku 4-bit models to ``models/diffusion_models``:
- For **50-series GPUs**, use the `FP4 model <nunchaku_flux1_dev_fp4_>`_.
+- For **50-series GPUs**, use the `FP4 model <hf_nunchaku-flux1-dev-fp4_>`_.
- For **other GPUs**, use the `INT4 model <nunchaku_flux1_dev_int4_>`_.
+- For **other GPUs**, use the `INT4 model <hf_nunchaku-flux1-dev-int4_>`_.
 **(Optional): Download Sample LoRAs**
-You can test with some sample LoRAs like `FLUX.1-Turbo <turbo_lora_>`_ and `Ghibsky <ghibsky_lora_>`_. Place these files in the ``models/loras`` directory:
+You can test with some sample LoRAs like `FLUX.1-Turbo <hf_lora_flux-turbo_>`_ and `Ghibsky <hf_lora_ghibsky_>`_. Place these files in the ``models/loras`` directory:
 .. code-block:: bat
@@ -234,7 +240,7 @@ You can test with some sample LoRAs like `FLUX.1-Turbo <turbo_lora_>`_ and `Ghib
 3. Set Up Workflows
 ^^^^^^^^^^^^^^^^^^^
-To use the official workflows, download them from the `ComfyUI-nunchaku <comfyui_nunchaku_>`_ and place them in your ``ComfyUI/user/default/workflows`` directory. The command can be:
+To use the official workflows, download them from the `ComfyUI-nunchaku <github_comfyui-nunchaku_>`_ and place them in your ``ComfyUI/user/default/workflows`` directory. The command can be:
 .. code-block:: bat

--- a/docs/source/links.rst
+++ b/docs/source/links.rst
-.. _svdquant_paper: http://arxiv.org/abs/2411.05007
-.. _deepcompressor_repo: https://github.com/mit-han-lab/deepcompressor
-.. _pytorch_home: https://pytorch.org/
-.. _flux_repo: https://github.com/black-forest-labs/flux
-.. _diffusers_repo: https://github.com/huggingface/diffusers
-.. _nunchaku_github_releases: https://github.com/mit-han-lab/nunchaku/releases
-.. _nunchaku_huggingface: https://huggingface.co/mit-han-lab/nunchaku/tree/main
-.. _nunchaku_modelscope: https://modelscope.cn/models/Lmxyy1999/nunchaku
-.. _comfyui_nunchaku: https://github.com/mit-han-lab/ComfyUI-nunchaku
-.. _comfyui_nunchaku_wheel_installation_workflow: https://github.com/mit-han-lab/ComfyUI-nunchaku/blob/main/example_workflows/install_wheel.json
-.. _comfyui_manager: https://github.com/Comfy-Org/ComfyUI-Manager
-.. _nvidia_cuda_downloads: https://developer.nvidia.com/cuda-downloads
-.. _visual_studio: https://visualstudio.microsoft.com/visual-cpp-build-tools/
-.. _git_downloads_win: https://git-scm.com/downloads/win
-.. _anaconda_download: https://www.anaconda.com/download/success
-.. _nunchaku_windows_tutorial_en: https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0
-.. _nunchaku_windows_tutorial_zh: https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee
-.. _nunchaku_repo: https://github.com/mit-han-lab/nunchaku
-.. _ghibsky_lora: https://huggingface.co/aleksa-codes/flux-ghibsky-illustration
-.. _turbo_lora: https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha
-.. _nunchaku_flux1_dev_fp4: https://huggingface.co/mit-han-lab/nunchaku-flux.1-dev/blob/main/svdq-fp4_r32-flux.1-dev.safetensors
-.. _nunchaku_flux1_dev_int4: https://huggingface.co/mit-han-lab/nunchaku-flux.1-dev/blob/main/svdq-int4_r32-flux.1-dev.safetensors
-.. _to_diffusers_lora: https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/lora.py#L100
-.. _to_nunchaku_lora: https://github.com/mit-han-lab/nunchaku/blob/main/nunchaku/lora/flux/nunchaku_converter.py#L442
-.. _flux1_tools: https://bfl.ai/announcements/24-11-21-tools
-.. _controlnet_union_pro: https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro
-.. _controlnet_union_pro2: https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0
-.. _fbcache: https://github.com/chengzeyi/ParaAttention?tab=readme-ov-file#first-block-cache-our-dynamic-caching
-.. _pulid_paper: https://arxiv.org/abs/2404.16022
-.. _flux1_kontext_dev: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
--- a/docs/source/links/blog.txt
+++ b/docs/source/links/blog.txt
+.. _blog_flux1-tools: https://bfl.ai/announcements/24-11-21-tools
+.. _comfyui_examples_flux: https://comfyanonymous.github.io/ComfyUI_examples/flux/
--- a/docs/source/links/cdn.txt
+++ b/docs/source/links/cdn.txt
--- a/docs/source/links/github.txt
+++ b/docs/source/links/github.txt
+.. _github_deepcompressor: https://github.com/nunchaku-tech/deepcompressor
+.. _github_flux: https://github.com/black-forest-labs/flux
+.. _github_diffusers: https://github.com/huggingface/diffusers
+.. _github_comfyui-nunchaku: https://github.com/nunchaku-tech/ComfyUI-nunchaku
+.. _github_nunchaku: https://github.com/nunchaku-tech/nunchaku
+.. _github_nunchaku_releases: https://github.com/nunchaku-tech/nunchaku/releases
+.. _github_comfyui-nunchaku_example_workflows: https://github.com/nunchaku-tech/ComfyUI-nunchaku/tree/main/example_workflows
+.. _github_comfyui-manager: https://github.com/Comfy-Org/ComfyUI-Manager
+.. _github_comfyui: https://github.com/comfyanonymous/ComfyUI
+.. _github_comfy-cli: https://github.com/Comfy-Org/comfy-cli
+.. _github_fbcache: https://github.com/chengzeyi/ParaAttention?tab=readme-ov-file#first-block-cache-our-dynamic-caching
+.. _github_comfyui-manager_missing-nodes-installation: https://github.com/ltdrdata/ComfyUI-Manager?tab=readme-ov-file#support-of-missing-nodes-installation
+.. _github_comfyui_controlnet_aux: https://github.com/Fannovel16/comfyui_controlnet_aux
--- a/docs/source/links/huggingface.txt
+++ b/docs/source/links/huggingface.txt
+.. _hf_nunchaku: https://huggingface.co/nunchaku-tech
+.. _hf_lora_ghibsky: https://huggingface.co/aleksa-codes/flux-ghibsky-illustration
+.. _hf_lora_flux-turbo: https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha
+.. _hf_cn-union-pro: https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro
+.. _hf_cn-union-pro2: https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0
+.. _hf_flux-kontext-dev: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
+.. _hf_nunchaku-flux1-dev-fp4: https://huggingface.co/mit-han-lab/nunchaku-flux.1-dev/blob/main/svdq-fp4_r32-flux.1-dev.safetensors
+.. _hf_nunchaku-flux1-dev-int4: https://huggingface.co/mit-han-lab/nunchaku-flux.1-dev/blob/main/svdq-int4_r32-flux.1-dev.safetensors
+.. _hf_depth_anything: https://huggingface.co/LiheYoung/depth-anything-large-hf
+.. _hf_nunchaku_wheels: https://huggingface.co/nunchaku-tech/nunchaku
--- a/docs/source/links/misc.txt
+++ b/docs/source/links/misc.txt
+.. _pytorch_home: https://pytorch.org/
+.. _nunchaku_windows_tutorial_en: https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0
+.. _nunchaku_windows_tutorial_zh: https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee
+.. _download_cuda: https://developer.nvidia.com/cuda-downloads
+.. _visual_studio: https://visualstudio.microsoft.com/visual-cpp-build-tools/
+.. _download_git_win: https://git-scm.com/downloads/win
+.. _download_anaconda: https://www.anaconda.com/download/success
--- a/docs/source/links/modelscope.txt
+++ b/docs/source/links/modelscope.txt
+.. _ms_nunchaku: https://modelscope.cn/organization/nunchaku-tech
+.. _ms_nunchaku_wheels: https://modelscope.cn/models/nunchaku-tech/nunchaku
--- a/docs/source/links/paper.txt
+++ b/docs/source/links/paper.txt
+.. _paper_svdquant: http://arxiv.org/abs/2411.05007
+.. _paper_pulid: https://arxiv.org/abs/2404.16022