You need to sign in or sign up before continuing.
Unverified Commit f4f11133 authored by Muyang Li's avatar Muyang Li Committed by GitHub
Browse files

feat: support AWQ 4-bit T5 and single-file model loading in ComfyUI (#421)

* better linter

* update

* remove merge t5; update the nightly-build.yaml workflow

* fix the workflow name

* no __metadata__ key

* remember to remove the files

* make linter happy

* check hardware compatibility

* ready to add tests

* update the README

* update the README
parent 02683930
BasedOnStyle: LLVM # K&R / "attach" braces like the code now
IndentWidth: 4 # 4‑space indents everywhere
TabWidth: 4
UseTab: Never # never convert to tabs
ColumnLimit: 120
BasedOnStyle: LLVM # K&R / "attach" braces like the code now
IndentWidth: 4 # 4‑space indents everywhere
TabWidth: 4
UseTab: Never # never convert to tabs
ColumnLimit: 120
AccessModifierOffset: -4
BreakBeforeBraces: Attach # `void foo() {` — brace on same line
BreakBeforeBraces: Attach # `void foo() {` — brace on same line
BraceWrapping:
AfterNamespace: false # `namespace x {` on same line
AfterNamespace: false # `namespace x {` on same line
SplitEmptyFunction: false
SplitEmptyRecord: false
SplitEmptyRecord: false
SplitEmptyNamespace: false
PointerAlignment: Right # `int *ptr`, `const Foo *bar`
ReferenceAlignment: Pointer # `int &ref` -> same rule as pointers
SortIncludes: false # keep the hand‑crafted include order
PointerAlignment: Right # `int *ptr`, `const Foo *bar`
ReferenceAlignment: Pointer # `int &ref` -> same rule as pointers
SortIncludes: false # keep the hand‑crafted include order
IncludeBlocks: Preserve
SortUsingDeclarations: false
IndentPPDirectives: None # keep `#pragma` / `#if` at column 0
IndentPPDirectives: None # keep `#pragma` / `#if` at column 0
AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: false
AllowShortBlocksOnASingleLine: false
BinPackParameters: false # one parameter per line (as written)
BinPackArguments: false
AlignAfterOpenBracket: Align # preserve the current hanging‑indent style
AlignConsecutiveAssignments: true
AllowShortIfStatementsOnASingleLine: false
AllowShortBlocksOnASingleLine: false
BinPackParameters: false # one parameter per line (as written)
BinPackArguments: false
AlignAfterOpenBracket: Align # preserve the current hanging‑indent style
AlignConsecutiveAssignments: true
AlignConsecutiveDeclarations: false
SpaceAfterTemplateKeyword: false
BreakTemplateDeclarations: Yes
......@@ -3,37 +3,36 @@ name: 🐞 Bug report
description: Create a report to help us reproduce and fix the bug
title: "[Bug] "
labels: ['Bug']
body:
- type: checkboxes
attributes:
label: Checklist
options:
- label: 1. I have searched for related issues and FAQs (https://github.com/mit-han-lab/nunchaku/blob/main/docs/faq.md) but was unable to find a solution.
- label: 2. The issue persists in the latest version.
- label: 3. Please note that without environment information and a minimal reproducible example, it will be difficult for us to reproduce and address the issue, which may delay our response.
- label: 4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed.
- label: 5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues.
- label: 6. I will do my best to describe the issue in English.
- type: textarea
attributes:
label: Describe the Bug
description: Provide a clear and concise explanation of the bug you encountered.
validations:
required: true
- type: textarea
attributes:
label: Environment
description: |
Please include relevant environment details such as your system specifications, Python version, PyTorch version, and CUDA version.
placeholder: "Example: Ubuntu 24.04, Python 3.11, PyTorch 2.6, CUDA 12.4"
validations:
required: true
- type: textarea
attributes:
label: Reproduction Steps
description: |
What command or script did you execute? Which **model** were you using?
placeholder: "Example: python run_model.py --config config.json"
validations:
required: true
- type: checkboxes
attributes:
label: Checklist
options:
- label: 1. I have searched for related issues and FAQs (https://github.com/mit-han-lab/nunchaku/blob/main/docs/faq.md) but was unable to find a solution.
- label: 2. The issue persists in the latest version.
- label: 3. Please note that without environment information and a minimal reproducible example, it will be difficult for us to reproduce and address the issue, which may delay our response.
- label: 4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed.
- label: 5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues.
- label: 6. I will do my best to describe the issue in English.
- type: textarea
attributes:
label: Describe the Bug
description: Provide a clear and concise explanation of the bug you encountered.
validations:
required: true
- type: textarea
attributes:
label: Environment
description: |
Please include relevant environment details such as your system specifications, Python version, PyTorch version, and CUDA version.
placeholder: "Example: Ubuntu 24.04, Python 3.11, PyTorch 2.6, CUDA 12.4"
validations:
required: true
- type: textarea
attributes:
label: Reproduction Steps
description: |
What command or script did you execute? Which **model** were you using?
placeholder: "Example: python run_model.py --config config.json"
validations:
required: true
......@@ -2,23 +2,22 @@
name: 🚀 Feature request
description: Suggest an idea for this project
title: "[Feature] "
body:
- type: checkboxes
attributes:
label: Checklist
options:
- label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
- label: 2. I will do my best to describe the issue in English.
- type: textarea
attributes:
label: Motivation
description: |
A clear and concise description of the motivation of the feature.
validations:
required: true
- type: textarea
attributes:
label: Related resources
description: |
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
- type: checkboxes
attributes:
label: Checklist
options:
- label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
- label: 2. I will do my best to describe the issue in English.
- type: textarea
attributes:
label: Motivation
description: |
A clear and concise description of the motivation of the feature.
validations:
required: true
- type: textarea
attributes:
label: Related resources
description: |
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
name: Auto-merge main into dev
on:
workflow_dispatch:
push:
branches:
- main
permissions:
contents: write
jobs:
merge-main-into-dev:
runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku'
steps:
- name: Checkout the repository
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GH_TOKEN }}
- name: Check if main and dev are already in sync
id: check_sync
run: |
......@@ -36,7 +31,6 @@ jobs:
echo "Branches differ. Proceeding with merge."
echo "skip_merge=false" >> "$GITHUB_OUTPUT"
fi
- name: Merge main into dev
id: last_commit
if: steps.check_sync.outputs.skip_merge == 'false'
......
name: Clean Old Nightly Releases
on:
schedule:
- cron: '* 6 * * *'
workflow_dispatch:
permissions:
contents: write
jobs:
cleanup:
name: Delete old nightly releases and tags
runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku'
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: List all nightly releases
id: list
run: |
......@@ -26,14 +21,12 @@ jobs:
echo "Found $(wc -l < nightly_tags.txt) nightly releases."
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Trim to old tags beyond latest 30
id: filter
run: |
tail -n +31 nightly_tags.txt > to_delete.txt || true
echo "Tags to delete:"
cat to_delete.txt || echo "(none)"
- name: Delete releases and tags
run: |
while read tag; do
......@@ -43,6 +36,5 @@ jobs:
done < to_delete.txt
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Done
run: echo "Nightly cleanup completed."
# Borrowed from https://github.com/sgl-project/sglang/blob/main/.github/workflows/close-inactive-issues.yml
name: Close Inactive Issues
on:
schedule:
- cron: '0 0 * * *'
workflow_dispatch:
permissions:
issues: write
contents: read
jobs:
close-inactive-issues:
if: github.repository == 'mit-han-lab/nunchaku'
......
name: Lint
on:
push:
branches:
- main
- dev
pull_request:
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install pre-commit hook
run: |
python -m pip install pre-commit
pre-commit install
- name: Linting
run: pre-commit run --all-files
name: Nightly Build
on:
schedule:
- cron: '0 8 * * *' # UTC time
- cron: '0 8 * * *' # UTC time
workflow_dispatch:
permissions:
contents: write
jobs:
tag:
name: Tag dev branch if dev version
......@@ -22,51 +19,38 @@ jobs:
with:
fetch-depth: 0
ref: dev
- name: Extract version from __version__.py
id: version
run: |
version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/')
echo "Extracted version: $version"
echo "version=$version" >> "$GITHUB_OUTPUT"
- name: Check if version contains 'dev'
- name: Determine if build is needed
id: check
run: |
if [[ "${{ steps.version.outputs.version }}" == *dev* ]]; then
echo "need_build=true" >> "$GITHUB_OUTPUT"
else
echo "need_build=false" >> "$GITHUB_OUTPUT"
fi
- name: Get latest tag with same version prefix
id: last_tag
if: steps.check.outputs.need_build == 'true'
run: |
prefix="v${{ steps.version.outputs.version }}"
tag=$(git tag --list "${prefix}*" --sort=-creatordate | head -n 1 || echo "")
echo "latest_tag=$tag" >> "$GITHUB_OUTPUT"
- name: Check if current commit is new
id: check_commit_diff
if: steps.check.outputs.need_build == 'true'
run: |
tag=${{ steps.last_tag.outputs.latest_tag }}
if [ -z "$tag" ]; then
echo "No previous tag found."
echo "need_build=true" >> "$GITHUB_OUTPUT"
else
base=$(git rev-parse "$tag")
head=$(git rev-parse HEAD)
if [ "$base" = "$head" ]; then
echo "No new commits since $tag"
echo "need_build=false" >> "$GITHUB_OUTPUT"
version="${{ steps.version.outputs.version }}"
need_build=false
if [[ "$version" == *dev* ]]; then
echo "Version contains 'dev'"
prefix="v$version"
tag=$(git tag --list "${prefix}*" --sort=-creatordate | head -n 1 || echo "")
if [ -z "$tag" ]; then
echo "No previous tag found."
need_build=true
else
echo "New commits found since $tag"
echo "need_build=true" >> "$GITHUB_OUTPUT"
base=$(git rev-parse "$tag")
head=$(git rev-parse HEAD)
if [ "$base" != "$head" ]; then
echo "New commits found since $tag"
need_build=true
else
echo "No new commits since $tag"
fi
fi
else
echo "Version does not contain 'dev'"
fi
echo "need_build=$need_build" >> "$GITHUB_OUTPUT"
- name: Set tag name
id: tag
if: steps.check.outputs.need_build == 'true'
......@@ -75,7 +59,6 @@ jobs:
tag_name="v${{ steps.version.outputs.version }}$today"
echo "tag_name=$tag_name"
echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT"
- name: Create and push tag
if: steps.check.outputs.need_build == 'true'
run: |
......@@ -83,11 +66,9 @@ jobs:
git config user.email "github-actions@users.noreply.github.com"
git tag ${{ steps.tag.outputs.tag_name }}
git push origin ${{ steps.tag.outputs.tag_name }}
- name: Skip tagging (version is not dev or no new commits)
if: steps.check.outputs.need_build == 'false'
run: echo "Version is not a dev version. Skipping tag."
run: echo "Version is not a dev version or no new commits. Skipping tag."
linux-wheels:
name: Build the linux nightly wheels
runs-on: [self-hosted, linux-build]
......@@ -97,7 +78,6 @@ jobs:
matrix:
python: ["3.10", "3.11", "3.12"]
torch: ["2.5", "2.6", "2.7"]
steps:
- name: Checkout to the tag
uses: actions/checkout@v4
......@@ -105,10 +85,8 @@ jobs:
fetch-depth: 0
ref: ${{ needs.tag.outputs.tag_name }}
submodules: true
- name: Show current commit
run: git log -1 --oneline
- name: Build wheels
run: |
if [[ "${{ matrix.torch }}" == "2.7" ]]; then
......@@ -117,7 +95,6 @@ jobs:
cuda_version="12.4"
fi
bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version
- name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2
with:
......@@ -127,21 +104,18 @@ jobs:
prerelease: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Clean up
if: always() && github.repository == 'mit-han-lab/nunchaku'
run: bash scripts/linux_cleanup.sh
windows-wheels:
name: Build the windows nightly wheels
runs-on: [ self-hosted, windows-build ]
runs-on: [self-hosted, windows-build]
needs: tag
if: needs.tag.outputs.need_build == 'true' && github.repository == 'mit-han-lab/nunchaku'
strategy:
matrix:
python: [ "3.10", "3.11", "3.12" ]
torch: [ "2.5", "2.6", "2.7" ]
python: ["3.10", "3.11", "3.12"]
torch: ["2.5", "2.6", "2.7"]
steps:
- name: Checkout to the tag
uses: actions/checkout@v4
......@@ -149,10 +123,8 @@ jobs:
fetch-depth: 0
ref: ${{ needs.tag.outputs.tag_name }}
submodules: true
- name: Show current commit
run: git log -1 --oneline
- name: Build wheels
shell: cmd
run: |
......@@ -164,7 +136,6 @@ jobs:
)
call C:\Users\muyangl\miniconda3\condabin\activate.bat activate
call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION%
- name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2
with:
......
name: Release Build
on:
workflow_dispatch:
permissions:
contents: write
jobs:
release:
name: Tag Main Branch and Create Release
......@@ -19,14 +16,12 @@ jobs:
with:
fetch-depth: 0
ref: main
- name: Extract version from __version__.py
id: version
run: |
version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/')
echo "Extracted version: $version"
echo "version=$version" >> "$GITHUB_OUTPUT"
- name: Create and push tag
id: tag
run: |
......@@ -36,7 +31,6 @@ jobs:
git tag $tag_name
git push origin $tag_name
echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT"
linux-wheels:
name: Build the linux release wheels
runs-on: [self-hosted, linux-build]
......@@ -45,7 +39,6 @@ jobs:
matrix:
python: ["3.10", "3.11", "3.12"]
torch: ["2.5", "2.6", "2.7"]
steps:
- name: Checkout to the tag
uses: actions/checkout@v4
......@@ -53,10 +46,8 @@ jobs:
fetch-depth: 0
ref: ${{ needs.release.outputs.tag_name }}
submodules: true
- name: Show current commit
run: git log -1 --oneline
- name: Build wheels
run: |
if [[ "${{ matrix.torch }}" == "2.7" ]]; then
......@@ -65,7 +56,6 @@ jobs:
cuda_version="12.4"
fi
bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version
- name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2
with:
......@@ -75,20 +65,17 @@ jobs:
prerelease: false
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Clean up
if: always()
run: bash scripts/linux_cleanup.sh
windows-wheels:
name: Build the windows release wheels
runs-on: [ self-hosted, windows-build ]
runs-on: [self-hosted, windows-build]
needs: release
strategy:
matrix:
python: [ "3.10", "3.11", "3.12" ]
torch: [ "2.5", "2.6", "2.7" ]
python: ["3.10", "3.11", "3.12"]
torch: ["2.5", "2.6", "2.7"]
steps:
- name: Checkout to the tag
uses: actions/checkout@v4
......@@ -96,10 +83,8 @@ jobs:
fetch-depth: 0
ref: ${{ needs.release.outputs.tag_name }}
submodules: true
- name: Show current commit
run: git log -1 --oneline
- name: Build wheels
shell: cmd
run: |
......@@ -111,7 +96,6 @@ jobs:
)
call C:\Users\muyangl\miniconda3\condabin\activate.bat activate
call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION%
- name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2
with:
......
name: Synchronize to Private Repository
on:
workflow_dispatch:
push:
branches:
- dev
permissions:
contents: write
jobs:
cherry-pick-commits:
runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku'
steps:
- name: Clone private repository
run: |
git clone https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku-dev.git
- name: Add public remote and fetch
run: |
cd nunchaku-dev
git remote add public https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku.git
git fetch public dev
- name: Cherry-pick latest commit from public/dev
run: |
set -e
......@@ -94,7 +88,6 @@ jobs:
done
git commit --amend --allow-empty -m "$NEW_MSG" --author="$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>"
- name: Push to the private main branch
run: |
cd nunchaku-dev
......
name: Ampere Tests
on:
workflow_dispatch:
inputs:
......@@ -10,11 +9,9 @@ on:
options:
- pr
- branch
pr_number:
description: 'Pull Request Number (only if test_target == "pr")'
required: false
branch_name:
description: 'Branch name (only if test_target == "branch")'
default: 'main'
......@@ -39,11 +36,10 @@ on:
concurrency:
group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
check-comment:
if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }}
runs-on: [ self-hosted, ampere ]
runs-on: [self-hosted, ampere]
outputs:
should_run: ${{ steps.check.outputs.should_run }}
steps:
......@@ -56,12 +52,10 @@ jobs:
else
echo "should_run=false" >> $GITHUB_OUTPUT
fi
run-tests:
runs-on: [ self-hosted, ampere ]
needs: [ check-comment ]
runs-on: [self-hosted, ampere]
needs: [check-comment]
if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }}
steps:
- name: Determine ref
id: set-ref
......@@ -76,16 +70,13 @@ jobs:
with:
ref: ${{ steps.set-ref.outputs.ref }}
submodules: true
- name: Show current commit
run: git log -1 --oneline
- name: Set up Python
run: |
which python
echo "Setting up Python with Conda"
conda create -n test_env python=3.11 -y
- name: Install dependencies
run: |
source $(conda info --base)/etc/profile.d/conda.sh
......@@ -95,7 +86,6 @@ jobs:
echo "Installing dependencies"
pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31
- name: Build
run: |
source $(conda info --base)/etc/profile.d/conda.sh
......@@ -103,7 +93,6 @@ jobs:
which python
NUNCHAKU_INSTALL_MODE=ALL python setup.py develop
pip install -r tests/requirements.txt
- name: Setup ComfyUI
run: |
source $(conda info --base)/etc/profile.d/conda.sh
......@@ -127,7 +116,6 @@ jobs:
pip install -r nunchaku_tests/requirements.txt
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py
- name: Run ComfyUI tests
run: |
source $(conda info --base)/etc/profile.d/conda.sh
......@@ -136,7 +124,6 @@ jobs:
cd ../ComfyUI
python nunchaku_tests/scripts/nunchaku_flux1_dev.py
pytest -v nunchaku_tests/
- name: Nunchaku FLUX memory tests
run: |
pwd
......@@ -144,28 +131,24 @@ jobs:
conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py
- name: Nunchaku FLUX example tests
run: |
source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py
- name: Nunchaku FLUX other tests
run: |
source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py
- name: Nunchaku SANA tests
run: |
source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana
- name: clean up
if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true')
run: |
......
name: Blackwell Tests
on:
workflow_dispatch:
inputs:
......@@ -10,11 +9,9 @@ on:
options:
- pr
- branch
pr_number:
description: 'Pull Request Number (only if test_target == "pr")'
required: false
branch_name:
description: 'Branch name (only if test_target == "branch")'
default: 'main'
......@@ -39,11 +36,10 @@ on:
concurrency:
group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
check-comment:
if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }}
runs-on: [ self-hosted, blackwell ]
runs-on: [self-hosted, blackwell]
outputs:
should_run: ${{ steps.check.outputs.should_run }}
steps:
......@@ -56,12 +52,10 @@ jobs:
else
echo "should_run=false" >> $GITHUB_OUTPUT
fi
run-tests:
runs-on: [ self-hosted, blackwell ]
needs: [ check-comment ]
runs-on: [self-hosted, blackwell]
needs: [check-comment]
if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }}
steps:
- name: Determine ref
id: set-ref
......@@ -76,16 +70,13 @@ jobs:
with:
ref: ${{ steps.set-ref.outputs.ref }}
submodules: true
- name: Show current commit
run: git log -1 --oneline
- name: Set up Python
run: |
which python
echo "Setting up Python with Conda"
conda create -n test_env python=3.11 -y
- name: Install dependencies
run: |
source $(conda info --base)/etc/profile.d/conda.sh
......@@ -95,7 +86,6 @@ jobs:
echo "Installing dependencies"
pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31
- name: Build
run: |
source $(conda info --base)/etc/profile.d/conda.sh
......@@ -103,7 +93,6 @@ jobs:
which python
NUNCHAKU_INSTALL_MODE=ALL python setup.py develop
pip install -r tests/requirements.txt
- name: Setup ComfyUI
run: |
source $(conda info --base)/etc/profile.d/conda.sh
......@@ -127,7 +116,6 @@ jobs:
pip install -r nunchaku_tests/requirements.txt
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py
- name: Run ComfyUI tests
run: |
source $(conda info --base)/etc/profile.d/conda.sh
......@@ -136,7 +124,6 @@ jobs:
cd ../ComfyUI
python nunchaku_tests/scripts/nunchaku_flux1_dev.py
pytest -v nunchaku_tests/
- name: Nunchaku FLUX memory tests
run: |
pwd
......@@ -144,28 +131,24 @@ jobs:
conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py
- name: Nunchaku FLUX example tests
run: |
source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py
- name: Nunchaku FLUX other tests
run: |
source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py
- name: Nunchaku SANA tests
run: |
source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana
- name: clean up
if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true')
run: |
......
# Adapted from https://github.com/sgl-project/sglang/blob/main/.pre-commit-config.yaml
default_stages: [ pre-commit, pre-push, manual ]
default_stages: [pre-commit, pre-push, manual]
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
......@@ -10,15 +9,15 @@ repos:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
args: [ --allow-multiple-documents ]
args: [--allow-multiple-documents]
- id: check-toml
- id: check-ast
- id: check-added-large-files
- id: check-merge-conflict
# - id: check-shebang-scripts-are-executable
- id: detect-private-key
# - id: debug-statements
# - id: no-commit-to-branch
# - id: debug-statements
# - id: no-commit-to-branch
- repo: https://github.com/PyCQA/isort
rev: 5.13.2
hooks:
......@@ -27,7 +26,7 @@ repos:
rev: v0.11.2
hooks:
- id: ruff
args: [ --fixable=F401 ]
args: [--fixable=F401]
files: ^(nunchaku/|examples/|tests/|app/)
exclude: \.ipynb$
- repo: https://github.com/psf/black
......@@ -35,14 +34,14 @@ repos:
hooks:
- id: black-jupyter
- id: black
args: [ -l, "120" ]
args: [-l, "120"]
files: ^(nunchaku/|examples/|tests/|app/)
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: v20.1.3
hooks:
- id: clang-format
types_or: [ c++, cuda ]
args: [ --style=file, --verbose ]
types_or: [c++, cuda]
args: [--style=file, --verbose]
- repo: https://github.com/kynan/nbstripout
rev: 0.8.1
hooks:
......@@ -50,3 +49,12 @@ repos:
args:
- '--keep-output'
- '--extra-keys=metadata.kernelspec metadata.language_info.version'
- repo: https://github.com/google/yamlfmt
rev: v0.17.0
hooks:
- id: yamlfmt
- repo: https://github.com/executablebooks/mdformat
rev: 0.7.22
hooks:
- id: mdformat
name: (Markdown) Format docs with mdformat
......@@ -2,7 +2,7 @@
<img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img>
</div>
<h3 align="center">
<a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
<a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/Nunchaku-519fed7f9de94e"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
</h3>
<h3 align="center">
......@@ -18,15 +18,11 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv
- **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
- **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
- **[2025-03-17]** 🚀 Released NVFP4 4-bit [Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) and FLUX.1-tools and also upgraded the INT4 FLUX.1-tool models. Download and update your models from our [HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c) or [ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641) collections!
- **[2025-03-13]** 📦 Separate the ComfyUI node into a [standalone repository](https://github.com/mit-han-lab/ComfyUI-nunchaku) for easier installation and release node v0.1.6! Plus, [4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) is now fully supported!
- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 Released!** We've supported [4-bit text encoder and per-layer CPU offloading](#Low-Memory-Inference), reducing FLUX's minimum memory requirement to just **4 GiB** while maintaining a **2–3× speedup**. This update also fixes various issues related to resolution, LoRA, pin memory, and runtime stability. Check out the release notes for full details!
- **[2025-02-20]** 🚀 We release the [pre-built wheels](https://huggingface.co/mit-han-lab/nunchaku) to simplify installation! Check [here](#Installation) for the guidance!
- **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
- **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
- **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**!
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out!
<details>
<summary>More</summary>
......@@ -53,23 +49,24 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
#### Quantization Method -- SVDQuant
![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)Overview of SVDQuant. Stage1: Originally, both the activation $\boldsymbol{X}$ and weights $\boldsymbol{W}$ contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation $\hat{\boldsymbol{X}}$ and weights $\hat{\boldsymbol{W}}$. While $\hat{\boldsymbol{X}}$ becomes easier to quantize, $\hat{\boldsymbol{W}}$ now becomes more difficult. Stage 3: SVDQuant further decomposes $\hat{\boldsymbol{W}}$ into a low-rank component $\boldsymbol{L}_1\boldsymbol{L}_2$ and a residual $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)Overview of SVDQuant. Stage1: Originally, both the activation $\\boldsymbol{X}$ and weights $\\boldsymbol{W}$ contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation $\\hat{\\boldsymbol{X}}$ and weights $\\hat{\\boldsymbol{W}}$. While $\\hat{\\boldsymbol{X}}$ becomes easier to quantize, $\\hat{\\boldsymbol{W}}$ now becomes more difficult. Stage 3: SVDQuant further decomposes $\\hat{\\boldsymbol{W}}$ into a low-rank component $\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ and a residual $\\hat{\\boldsymbol{W}}-\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
#### Nunchaku Engine Design
![engine](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.
## Performance
![efficiency](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU.
## Installation
We provide tutorial videos to help you install and use Nunchaku on Windows, available in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee). You can also follow the corresponding step-by-step text guide at [`docs/setup_windows.md`](docs/setup_windows.md). If you run into issues, these resources are a good place to start.
### Wheels
#### Prerequisites
Before installation, ensure you have [PyTorch>=2.5](https://pytorch.org/) installed. For example, you can use the following command to install PyTorch 2.6:
```shell
......@@ -77,6 +74,7 @@ pip install torch==2.6 torchvision==0.21 torchaudio==2.6
```
#### Install nunchaku
Once PyTorch is installed, you can directly install `nunchaku` from [Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main), [ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku) or [GitHub release](https://github.com/mit-han-lab/nunchaku/releases). Be sure to select the appropriate wheel for your Python and PyTorch version. For example, for Python 3.11 and PyTorch 2.6:
```shell
......@@ -111,12 +109,11 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
**Note**:
* Make sure your CUDA version is **at least 12.2 on Linux** and **at least 12.6 on Windows**. If you're using a Blackwell GPU (e.g., 50-series GPUs), CUDA **12.8 or higher is required**.
* For Windows users, please refer to [this issue](https://github.com/mit-han-lab/nunchaku/issues/6) for the instruction. Please upgrade your MSVC compiler to the latest version.
- Make sure your CUDA version is **at least 12.2 on Linux** and **at least 12.6 on Windows**. If you're using a Blackwell GPU (e.g., 50-series GPUs), CUDA **12.8 or higher is required**.
* We currently support only NVIDIA GPUs with architectures sm_75 (Turing: RTX 2080), sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See [this issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
- For Windows users, please refer to [this issue](https://github.com/mit-han-lab/nunchaku/issues/6) for the instruction. Please upgrade your MSVC compiler to the latest version.
- We currently support only NVIDIA GPUs with architectures sm_75 (Turing: RTX 2080), sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See [this issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
1. Install dependencies:
......@@ -136,32 +133,32 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```
2. Install `nunchaku` package:
Make sure you have `gcc/g++>=11`. If you don't, you can install it via Conda on Linux:
1. Install `nunchaku` package:
Make sure you have `gcc/g++>=11`. If you don't, you can install it via Conda on Linux:
```shell
conda install -c conda-forge gxx=11 gcc=11
```
```shell
conda install -c conda-forge gxx=11 gcc=11
```
For Windows users, you can download and install the latest [Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false).
For Windows users, you can download and install the latest [Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false).
Then build the package from source with
Then build the package from source with
```shell
git clone https://github.com/mit-han-lab/nunchaku.git
cd nunchaku
git submodule init
git submodule update
python setup.py develop
```
```shell
git clone https://github.com/mit-han-lab/nunchaku.git
cd nunchaku
git submodule init
git submodule update
python setup.py develop
```
If you are building wheels for distribution, use:
If you are building wheels for distribution, use:
```shell
NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
```
```shell
NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
```
Make sure to set the environment variable `NUNCHAKU_INSTALL_MODE` to `ALL`. Otherwise, the generated wheels will only work on GPUs with the same architecture as the build machine.
Make sure to set the environment variable `NUNCHAKU_INSTALL_MODE` to `ALL`. Otherwise, the generated wheels will only work on GPUs with the same architecture as the build machine.
## Usage Example
......@@ -285,14 +282,14 @@ Please refer to [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/Co
## Gradio Demos
* FLUX.1 Models
* Text-to-image: see [`app/flux.1/t2i`](app/flux.1/t2i).
* Sketch-to-Image ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)): see [`app/flux.1/sketch`](app/flux.1/sketch).
* Depth/Canny-to-Image ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)): see [`app/flux.1/depth_canny`](app/flux.1/depth_canny).
* Inpainting ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)): see [`app/flux.1/fill`](app/flux.1/fill).
* Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)): see [`app/flux.1/redux`](app/flux.1/redux).
* SANA:
* Text-to-image: see [`app/sana/t2i`](app/sana/t2i).
- FLUX.1 Models
- Text-to-image: see [`app/flux.1/t2i`](app/flux.1/t2i).
- Sketch-to-Image ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)): see [`app/flux.1/sketch`](app/flux.1/sketch).
- Depth/Canny-to-Image ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)): see [`app/flux.1/depth_canny`](app/flux.1/depth_canny).
- Inpainting ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)): see [`app/flux.1/fill`](app/flux.1/fill).
- Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)): see [`app/flux.1/redux`](app/flux.1/redux).
- SANA:
- Text-to-image: see [`app/sana/t2i`](app/sana/t2i).
## Customized Model Quantization
......@@ -307,6 +304,7 @@ Please refer to [app/flux/t2i/README.md](app/flux/t2i/README.md) for instruction
Please check [here](https://github.com/mit-han-lab/nunchaku/issues/266) for the roadmap for April.
## Contribution
We warmly welcome contributions from the community! To get started, please refer to our [contribution guide](docs/contribution_guide.md) for instructions on how to contribute code to Nunchaku.
## Troubleshooting
......@@ -319,13 +317,13 @@ For enterprises interested in adopting SVDQuant or Nunchaku, including technical
## Related Projects
* [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
* [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023
* [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
* [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
* [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
* [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
* [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
- [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
- [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023
- [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
- [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
- [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
- [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
- [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
## Citation
......
......@@ -2,7 +2,7 @@
<img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img>
</div>
<h3 align="center">
<a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
<a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/Nunchaku-519fed7f9de94e"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
</h3>
<h3 align="center">
......@@ -18,13 +18,10 @@
- **[2025-04-09]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频,协助安装和使用Nunchaku。
- **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262),帮助社区快速上手并了解Nunchaku最新进展。
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布!** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py),通过[**FP16 attention**](#fp16-attention)[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py),覆盖更多用户!
- **[2025-03-17]** 🚀 发布NVFP4 4-bit量化版[Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)和FLUX.1工具集,升级INT4 FLUX.1工具模型。从[HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c)[ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641)下载更新!
- **[2025-03-13]** 📦 ComfyUI节点[独立仓库](https://github.com/mit-han-lab/ComfyUI-nunchaku)发布,安装更便捷!节点版本v0.1.6上线,全面支持[4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)
- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 发布!** 支持4-bit文本编码器和分层CPU offloading,FLUX最低显存需求降至**4 GiB**,同时保持**2–3倍加速**。修复分辨率、LoRA、内存锁定等稳定性问题,详情见更新日志!
- **[2025-02-20]** 🚀 发布[预编译wheel包](https://huggingface.co/mit-han-lab/nunchaku),简化安装步骤!查看[安装指南](#安装指南)
- **[2025-02-20]** 🚀 **NVIDIA RTX 5090支持NVFP4精度!** 相比INT4,NVFP4画质更优,在RTX 5090上比BF16快**约3倍**[博客详解](https://hanlab.mit.edu/blog/svdquant-nvfp4)[示例代码](./examples)[在线演示](https://svdquant.mit.edu/flux1-schnell/)已上线!
- **[2025-02-18]** 🔥 新增[自定义LoRA转换](#自定义lora)[模型量化](#自定义模型量化)指南![ComfyUI](./comfyui)工作流支持**自定义LoRA****FLUX.1工具集**
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007)入选ICLR 2025 Spotlight!FLUX.1工具集使用演示上线!** [使用演示](#使用演示)已更新![深度图生成演示](https://svdquant.mit.edu/flux1-depth-dev/)同步开放!
- **[2025-02-18]** 🔥 新增[自定义LoRA转换](#%E8%87%AA%E5%AE%9A%E4%B9%89lora)[模型量化](#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B%E9%87%8F%E5%8C%96)指南![ComfyUI](./comfyui)工作流支持**自定义LoRA****FLUX.1工具集**
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007)入选ICLR 2025 Spotlight!FLUX.1工具集使用演示上线!** [使用演示](#%E4%BD%BF%E7%94%A8%E6%BC%94%E7%A4%BA)已更新![深度图生成演示](https://svdquant.mit.edu/flux1-depth-dev/)同步开放!
<details>
<summary>更多动态</summary>
......@@ -52,7 +49,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
#### 量化方法 -- SVDQuant
![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)SVDQuant三阶段示意图。阶段1:原始激活 $\boldsymbol{X}$ 和权重 $\boldsymbol{W}$ 均含异常值,4-bit量化困难。阶段2:将激活异常值迁移至权重,得到更易量化的激活 $\hat{\boldsymbol{X}}$ 和更难量化的权重 $\hat{\boldsymbol{W}}$ 。阶段3:通过SVD将 $\hat{\boldsymbol{W}}$ 分解为低秩分量 $\boldsymbol{L}_1\boldsymbol{L}_2$ 和残差 $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ ,低秩分支以16位精度运行缓解量化难度。
![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)SVDQuant三阶段示意图。阶段1:原始激活 $\\boldsymbol{X}$ 和权重 $\\boldsymbol{W}$ 均含异常值,4-bit量化困难。阶段2:将激活异常值迁移至权重,得到更易量化的激活 $\\hat{\\boldsymbol{X}}$ 和更难量化的权重 $\\hat{\\boldsymbol{W}}$ 。阶段3:通过SVD将 $\\hat{\\boldsymbol{W}}$ 分解为低秩分量 $\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ 和残差 $\\hat{\\boldsymbol{W}}-\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ ,低秩分支以16位精度运行缓解量化难度。
#### Nunchaku引擎设计
......@@ -69,6 +66,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
### Wheel包安装
#### 前置条件
确保已安装 [PyTorch>=2.5](https://pytorch.org/)。例如:
```shell
......@@ -76,6 +74,7 @@ pip install torch==2.6 torchvision==0.21 torchaudio==2.6
```
#### 安装nunchaku
[Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main)[ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku)[GitHub release](https://github.com/mit-han-lab/nunchaku/releases)选择对应Python和PyTorch版本的wheel。例如Python 3.11和PyTorch 2.6:
```shell
......@@ -110,9 +109,9 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
**注意**
* Linux需CUDA≥12.2,Windows需CUDA≥12.6。Blackwell显卡需CUDA≥12.8。
* Windows用户请参考[此问题](https://github.com/mit-han-lab/nunchaku/issues/6)升级MSVC编译器。
* 支持SM_75(Turing:RTX 2080)、SM_86(Ampere:RTX 3090)、SM_89(Ada:RTX 4090)、SM_80(A100)架构显卡,详见[此问题](https://github.com/mit-han-lab/nunchaku/issues/1)
- Linux需CUDA≥12.2,Windows需CUDA≥12.6。Blackwell显卡需CUDA≥12.8。
- Windows用户请参考[此问题](https://github.com/mit-han-lab/nunchaku/issues/6)升级MSVC编译器。
- 支持SM_75(Turing:RTX 2080)、SM_86(Ampere:RTX 3090)、SM_89(Ada:RTX 4090)、SM_80(A100)架构显卡,详见[此问题](https://github.com/mit-han-lab/nunchaku/issues/1)
1. 安装依赖:
......@@ -132,32 +131,32 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```
2. 编译安装:
1. 编译安装:
确保`gcc/g++≥11`。Linux用户可通过Conda安装:
```shell
conda install -c conda-forge gxx=11 gcc=11
```
```shell
conda install -c conda-forge gxx=11 gcc=11
```
Windows用户请安装最新[Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false)。
Windows用户请安装最新[Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false)
编译命令:
编译命令:
```shell
git clone https://github.com/mit-han-lab/nunchaku.git
cd nunchaku
git submodule init
git submodule update
python setup.py develop
```
```shell
git clone https://github.com/mit-han-lab/nunchaku.git
cd nunchaku
git submodule init
git submodule update
python setup.py develop
```
打包wheel:
打包wheel:
```shell
NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
```
```shell
NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
```
设置`NUNCHAKU_INSTALL_MODE=ALL`确保wheel支持所有显卡架构。
设置`NUNCHAKU_INSTALL_MODE=ALL`确保wheel支持所有显卡架构。
## 使用示例
......@@ -179,7 +178,7 @@ image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, g
image.save(f"flux.1-dev-{precision}.png")
```
**注意****Turing显卡用户(如20系列)**需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块,完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py)
**注意**\*\*Turing显卡用户(如20系列)\*\*需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块,完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py)
### FP16 Attention
......@@ -281,14 +280,14 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
## 使用演示
* FLUX.1 模型
* 文生图:见 [`app/flux.1/t2i`](app/flux.1/t2i)
* 草图生成图像 ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)):见 [`app/flux.1/sketch`](app/flux.1/sketch)
* 深度/Canny 边缘生成图像 ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)):见 [`app/flux.1/depth_canny`](app/flux.1/depth_canny)
* 修复 ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)):见 [`app/flux.1/fill`](app/flux.1/fill)
* Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)):见 [`app/flux.1/redux`](app/flux.1/redux)
* SANA:
* 文生图:见 [`app/sana/t2i`](app/sana/t2i)
- FLUX.1 模型
- 文生图:见 [`app/flux.1/t2i`](app/flux.1/t2i)
- 草图生成图像 ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)):见 [`app/flux.1/sketch`](app/flux.1/sketch)
- 深度/Canny 边缘生成图像 ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)):见 [`app/flux.1/depth_canny`](app/flux.1/depth_canny)
- 修复 ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)):见 [`app/flux.1/fill`](app/flux.1/fill)
- Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)):见 [`app/flux.1/redux`](app/flux.1/redux)
- SANA:
- 文生图:见 [`app/sana/t2i`](app/sana/t2i)
## 自定义模型量化
......@@ -303,6 +302,7 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
请查看 [此处](https://github.com/mit-han-lab/nunchaku/issues/266) 获取四月的路线图。
## 贡献
我们诚挚欢迎社区贡献!请参阅[贡献指南](docs/contribution_guide_ZH.md)了解如何为 Nunchaku 贡献代码。
## 问题排查
......@@ -315,13 +315,13 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
## 相关项目
* [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
* [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023
* [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
* [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
* [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
* [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
* [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
- [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
- [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023
- [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
- [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
- [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
- [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
- [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
## 引用
......
import json
from pathlib import Path
import yaml
from safetensors.torch import save_file
from tqdm import tqdm
from nunchaku.utils import load_state_dict_in_safetensors
def load_yaml(path: str | Path) -> dict:
with open(path, "r", encoding="utf-8") as file:
data = yaml.safe_load(file)
return data
if __name__ == "__main__":
# data = load_yaml("nunchaku_models.yaml")
# for model in tqdm(data["diffusion_models"]):
# for precision in ["int4", "fp4"]:
# repo_id = model["repo_id"]
# filename = model["filename"].format(precision=precision)
# sd, metadata = load_state_dict_in_safetensors(Path(repo_id) / filename, return_metadata=True)
# metadata["model_class"] = "NunchakuFluxTransformer2dModel"
# quantization_config = {
# "method": "svdquant",
# "weight": {
# "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
# "scale_dtype": [None, "fp8_e4m3_nan"] if precision == "fp4" else None,
# "group_size": 16 if precision == "fp4" else 64,
# },
# "activation": {
# "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
# "scale_dtype": "fp8_e4m3_nan" if precision == "fp4" else None,
# "group_size": 16 if precision == "fp4" else 64,
# },
# }
# metadata["quantization_config"] = json.dumps(quantization_config)
# output_dir = Path("nunchaku-models") / Path(repo_id).name
# output_dir.mkdir(parents=True, exist_ok=True)
# save_file(sd, output_dir / filename, metadata=metadata)
# sd, metadata = load_state_dict_in_safetensors(
# "mit-han-lab/nunchaku-t5/awq-int4-flux.1-t5xxl.safetensors", return_metadata=True
# )
# metadata["model_class"] = "NunchakuT5EncoderModel"
# quantization_config = {"method": "awq", "weight": {"dtype": "int4", "scale_dtype": None, "group_size": 128}}
# output_dir = Path("nunchaku-models") / "nunchaku-t5"
# output_dir.mkdir(parents=True, exist_ok=True)
# save_file(sd, output_dir / "awq-int4-flux.1-t5xxl.safetensors", metadata=metadata)
sd, metadata = load_state_dict_in_safetensors(
"mit-han-lab/nunchaku-sana/svdq-int4_r32-sana1.6b.safetensors", return_metadata=True
)
metadata["model_class"] = "NunchakuSanaTransformer2DModel"
precision = "int4"
quantization_config = {
"method": "svdquant",
"weight": {
"dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
"scale_dtype": [None, "fp8_e4m3_nan"] if precision == "fp4" else None,
"group_size": 16 if precision == "fp4" else 64,
},
"activation": {
"dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
"scale_dtype": "fp8_e4m3_nan" if precision == "fp4" else None,
"group_size": 16 if precision == "fp4" else 64,
},
}
output_dir = Path("nunchaku-models") / "nunchaku-sana"
output_dir.mkdir(parents=True, exist_ok=True)
save_file(sd, output_dir / "svdq-int4_r32-sana1.6b.safetensors", metadata=metadata)
......@@ -6,8 +6,8 @@ This interactive Gradio application transforms your uploaded image into a differ
The base models are:
* [FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) (preserves depth map)
* [FLUX.1-Canny-dev](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) (preserves Canny edge)
- [FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) (preserves depth map)
- [FLUX.1-Canny-dev](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) (preserves Canny edge)
First you need to install some dependencies:
......@@ -22,7 +22,7 @@ Then run:
python run_gradio.py
```
* By default, the model is `FLUX.1-Depth-dev`. You can add `-m canny` to switch to `FLUX.1-Canny-dev`.
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
- By default, the model is `FLUX.1-Depth-dev`. You can add `-m canny` to switch to `FLUX.1-Canny-dev`.
- The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
- To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
- By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
......@@ -8,6 +8,6 @@ This interactive Gradio application allows you to interactively inpaint an uploa
python run_gradio.py
```
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
- The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
- To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
- By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
......@@ -8,4 +8,4 @@ This interactive Gradio application allows you to interactively generate image v
python run_gradio.py
```
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
- By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
......@@ -10,6 +10,6 @@ To launch the application, simply run:
python run_gradio.py
```
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
- The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
- To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
- By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment