Unverified Commit f4f11133 authored by Muyang Li's avatar Muyang Li Committed by GitHub
Browse files

feat: support AWQ 4-bit T5 and single-file model loading in ComfyUI (#421)

* better linter

* update

* remove merge t5; update the nightly-build.yaml workflow

* fix the workflow name

* no __metadata__ key

* remember to remove the files

* make linter happy

* check hardware compatibility

* ready to add tests

* update the README

* update the README
parent 02683930
...@@ -3,36 +3,26 @@ IndentWidth: 4 # 4‑space indents everywhere ...@@ -3,36 +3,26 @@ IndentWidth: 4 # 4‑space indents everywhere
TabWidth: 4 TabWidth: 4
UseTab: Never # never convert to tabs UseTab: Never # never convert to tabs
ColumnLimit: 120 ColumnLimit: 120
AccessModifierOffset: -4 AccessModifierOffset: -4
BreakBeforeBraces: Attach # `void foo() {` — brace on same line BreakBeforeBraces: Attach # `void foo() {` — brace on same line
BraceWrapping: BraceWrapping:
AfterNamespace: false # `namespace x {` on same line AfterNamespace: false # `namespace x {` on same line
SplitEmptyFunction: false SplitEmptyFunction: false
SplitEmptyRecord: false SplitEmptyRecord: false
SplitEmptyNamespace: false SplitEmptyNamespace: false
PointerAlignment: Right # `int *ptr`, `const Foo *bar` PointerAlignment: Right # `int *ptr`, `const Foo *bar`
ReferenceAlignment: Pointer # `int &ref` -> same rule as pointers ReferenceAlignment: Pointer # `int &ref` -> same rule as pointers
SortIncludes: false # keep the hand‑crafted include order SortIncludes: false # keep the hand‑crafted include order
IncludeBlocks: Preserve IncludeBlocks: Preserve
SortUsingDeclarations: false SortUsingDeclarations: false
IndentPPDirectives: None # keep `#pragma` / `#if` at column 0 IndentPPDirectives: None # keep `#pragma` / `#if` at column 0
AllowShortFunctionsOnASingleLine: Empty AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: false AllowShortIfStatementsOnASingleLine: false
AllowShortBlocksOnASingleLine: false AllowShortBlocksOnASingleLine: false
BinPackParameters: false # one parameter per line (as written) BinPackParameters: false # one parameter per line (as written)
BinPackArguments: false BinPackArguments: false
AlignAfterOpenBracket: Align # preserve the current hanging‑indent style AlignAfterOpenBracket: Align # preserve the current hanging‑indent style
AlignConsecutiveAssignments: true AlignConsecutiveAssignments: true
AlignConsecutiveDeclarations: false AlignConsecutiveDeclarations: false
SpaceAfterTemplateKeyword: false SpaceAfterTemplateKeyword: false
BreakTemplateDeclarations: Yes BreakTemplateDeclarations: Yes
...@@ -3,9 +3,8 @@ name: 🐞 Bug report ...@@ -3,9 +3,8 @@ name: 🐞 Bug report
description: Create a report to help us reproduce and fix the bug description: Create a report to help us reproduce and fix the bug
title: "[Bug] " title: "[Bug] "
labels: ['Bug'] labels: ['Bug']
body: body:
- type: checkboxes - type: checkboxes
attributes: attributes:
label: Checklist label: Checklist
options: options:
...@@ -15,13 +14,13 @@ body: ...@@ -15,13 +14,13 @@ body:
- label: 4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed. - label: 4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed.
- label: 5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues. - label: 5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues.
- label: 6. I will do my best to describe the issue in English. - label: 6. I will do my best to describe the issue in English.
- type: textarea - type: textarea
attributes: attributes:
label: Describe the Bug label: Describe the Bug
description: Provide a clear and concise explanation of the bug you encountered. description: Provide a clear and concise explanation of the bug you encountered.
validations: validations:
required: true required: true
- type: textarea - type: textarea
attributes: attributes:
label: Environment label: Environment
description: | description: |
...@@ -29,7 +28,7 @@ body: ...@@ -29,7 +28,7 @@ body:
placeholder: "Example: Ubuntu 24.04, Python 3.11, PyTorch 2.6, CUDA 12.4" placeholder: "Example: Ubuntu 24.04, Python 3.11, PyTorch 2.6, CUDA 12.4"
validations: validations:
required: true required: true
- type: textarea - type: textarea
attributes: attributes:
label: Reproduction Steps label: Reproduction Steps
description: | description: |
......
...@@ -2,22 +2,21 @@ ...@@ -2,22 +2,21 @@
name: 🚀 Feature request name: 🚀 Feature request
description: Suggest an idea for this project description: Suggest an idea for this project
title: "[Feature] " title: "[Feature] "
body: body:
- type: checkboxes - type: checkboxes
attributes: attributes:
label: Checklist label: Checklist
options: options:
- label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed. - label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
- label: 2. I will do my best to describe the issue in English. - label: 2. I will do my best to describe the issue in English.
- type: textarea - type: textarea
attributes: attributes:
label: Motivation label: Motivation
description: | description: |
A clear and concise description of the motivation of the feature. A clear and concise description of the motivation of the feature.
validations: validations:
required: true required: true
- type: textarea - type: textarea
attributes: attributes:
label: Related resources label: Related resources
description: | description: |
......
name: Auto-merge main into dev name: Auto-merge main into dev
on: on:
workflow_dispatch: workflow_dispatch:
push: push:
branches: branches:
- main - main
permissions: permissions:
contents: write contents: write
jobs: jobs:
merge-main-into-dev: merge-main-into-dev:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku' if: github.repository == 'mit-han-lab/nunchaku'
steps: steps:
- name: Checkout the repository - name: Checkout the repository
uses: actions/checkout@v4 uses: actions/checkout@v4
with: with:
fetch-depth: 0 fetch-depth: 0
token: ${{ secrets.GH_TOKEN }} token: ${{ secrets.GH_TOKEN }}
- name: Check if main and dev are already in sync - name: Check if main and dev are already in sync
id: check_sync id: check_sync
run: | run: |
...@@ -36,7 +31,6 @@ jobs: ...@@ -36,7 +31,6 @@ jobs:
echo "Branches differ. Proceeding with merge." echo "Branches differ. Proceeding with merge."
echo "skip_merge=false" >> "$GITHUB_OUTPUT" echo "skip_merge=false" >> "$GITHUB_OUTPUT"
fi fi
- name: Merge main into dev - name: Merge main into dev
id: last_commit id: last_commit
if: steps.check_sync.outputs.skip_merge == 'false' if: steps.check_sync.outputs.skip_merge == 'false'
......
name: Clean Old Nightly Releases name: Clean Old Nightly Releases
on: on:
schedule: schedule:
- cron: '* 6 * * *' - cron: '* 6 * * *'
workflow_dispatch: workflow_dispatch:
permissions: permissions:
contents: write contents: write
jobs: jobs:
cleanup: cleanup:
name: Delete old nightly releases and tags name: Delete old nightly releases and tags
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku' if: github.repository == 'mit-han-lab/nunchaku'
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: List all nightly releases - name: List all nightly releases
id: list id: list
run: | run: |
...@@ -26,14 +21,12 @@ jobs: ...@@ -26,14 +21,12 @@ jobs:
echo "Found $(wc -l < nightly_tags.txt) nightly releases." echo "Found $(wc -l < nightly_tags.txt) nightly releases."
env: env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Trim to old tags beyond latest 30 - name: Trim to old tags beyond latest 30
id: filter id: filter
run: | run: |
tail -n +31 nightly_tags.txt > to_delete.txt || true tail -n +31 nightly_tags.txt > to_delete.txt || true
echo "Tags to delete:" echo "Tags to delete:"
cat to_delete.txt || echo "(none)" cat to_delete.txt || echo "(none)"
- name: Delete releases and tags - name: Delete releases and tags
run: | run: |
while read tag; do while read tag; do
...@@ -43,6 +36,5 @@ jobs: ...@@ -43,6 +36,5 @@ jobs:
done < to_delete.txt done < to_delete.txt
env: env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Done - name: Done
run: echo "Nightly cleanup completed." run: echo "Nightly cleanup completed."
# Borrowed from https://github.com/sgl-project/sglang/blob/main/.github/workflows/close-inactive-issues.yml # Borrowed from https://github.com/sgl-project/sglang/blob/main/.github/workflows/close-inactive-issues.yml
name: Close Inactive Issues name: Close Inactive Issues
on: on:
schedule: schedule:
- cron: '0 0 * * *' - cron: '0 0 * * *'
workflow_dispatch: workflow_dispatch:
permissions: permissions:
issues: write issues: write
contents: read contents: read
jobs: jobs:
close-inactive-issues: close-inactive-issues:
if: github.repository == 'mit-han-lab/nunchaku' if: github.repository == 'mit-han-lab/nunchaku'
......
name: Lint name: Lint
on: on:
push: push:
branches: branches:
- main - main
- dev - dev
pull_request: pull_request:
jobs: jobs:
lint: lint:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- name: Set up Python - name: Set up Python
uses: actions/setup-python@v4 uses: actions/setup-python@v4
with: with:
python-version: '3.11' python-version: '3.11'
- name: Install pre-commit hook - name: Install pre-commit hook
run: | run: |
python -m pip install pre-commit python -m pip install pre-commit
pre-commit install pre-commit install
- name: Linting - name: Linting
run: pre-commit run --all-files run: pre-commit run --all-files
name: Nightly Build name: Nightly Build
on: on:
schedule: schedule:
- cron: '0 8 * * *' # UTC time - cron: '0 8 * * *' # UTC time
workflow_dispatch: workflow_dispatch:
permissions: permissions:
contents: write contents: write
jobs: jobs:
tag: tag:
name: Tag dev branch if dev version name: Tag dev branch if dev version
...@@ -22,51 +19,38 @@ jobs: ...@@ -22,51 +19,38 @@ jobs:
with: with:
fetch-depth: 0 fetch-depth: 0
ref: dev ref: dev
- name: Extract version from __version__.py - name: Extract version from __version__.py
id: version id: version
run: | run: |
version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/') version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/')
echo "Extracted version: $version" echo "Extracted version: $version"
echo "version=$version" >> "$GITHUB_OUTPUT" echo "version=$version" >> "$GITHUB_OUTPUT"
- name: Determine if build is needed
- name: Check if version contains 'dev'
id: check id: check
run: | run: |
if [[ "${{ steps.version.outputs.version }}" == *dev* ]]; then version="${{ steps.version.outputs.version }}"
echo "need_build=true" >> "$GITHUB_OUTPUT" need_build=false
else if [[ "$version" == *dev* ]]; then
echo "need_build=false" >> "$GITHUB_OUTPUT" echo "Version contains 'dev'"
fi prefix="v$version"
- name: Get latest tag with same version prefix
id: last_tag
if: steps.check.outputs.need_build == 'true'
run: |
prefix="v${{ steps.version.outputs.version }}"
tag=$(git tag --list "${prefix}*" --sort=-creatordate | head -n 1 || echo "") tag=$(git tag --list "${prefix}*" --sort=-creatordate | head -n 1 || echo "")
echo "latest_tag=$tag" >> "$GITHUB_OUTPUT"
- name: Check if current commit is new
id: check_commit_diff
if: steps.check.outputs.need_build == 'true'
run: |
tag=${{ steps.last_tag.outputs.latest_tag }}
if [ -z "$tag" ]; then if [ -z "$tag" ]; then
echo "No previous tag found." echo "No previous tag found."
echo "need_build=true" >> "$GITHUB_OUTPUT" need_build=true
else else
base=$(git rev-parse "$tag") base=$(git rev-parse "$tag")
head=$(git rev-parse HEAD) head=$(git rev-parse HEAD)
if [ "$base" = "$head" ]; then if [ "$base" != "$head" ]; then
echo "No new commits since $tag"
echo "need_build=false" >> "$GITHUB_OUTPUT"
else
echo "New commits found since $tag" echo "New commits found since $tag"
echo "need_build=true" >> "$GITHUB_OUTPUT" need_build=true
else
echo "No new commits since $tag"
fi fi
fi fi
else
echo "Version does not contain 'dev'"
fi
echo "need_build=$need_build" >> "$GITHUB_OUTPUT"
- name: Set tag name - name: Set tag name
id: tag id: tag
if: steps.check.outputs.need_build == 'true' if: steps.check.outputs.need_build == 'true'
...@@ -75,7 +59,6 @@ jobs: ...@@ -75,7 +59,6 @@ jobs:
tag_name="v${{ steps.version.outputs.version }}$today" tag_name="v${{ steps.version.outputs.version }}$today"
echo "tag_name=$tag_name" echo "tag_name=$tag_name"
echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT" echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT"
- name: Create and push tag - name: Create and push tag
if: steps.check.outputs.need_build == 'true' if: steps.check.outputs.need_build == 'true'
run: | run: |
...@@ -83,11 +66,9 @@ jobs: ...@@ -83,11 +66,9 @@ jobs:
git config user.email "github-actions@users.noreply.github.com" git config user.email "github-actions@users.noreply.github.com"
git tag ${{ steps.tag.outputs.tag_name }} git tag ${{ steps.tag.outputs.tag_name }}
git push origin ${{ steps.tag.outputs.tag_name }} git push origin ${{ steps.tag.outputs.tag_name }}
- name: Skip tagging (version is not dev or no new commits) - name: Skip tagging (version is not dev or no new commits)
if: steps.check.outputs.need_build == 'false' if: steps.check.outputs.need_build == 'false'
run: echo "Version is not a dev version. Skipping tag." run: echo "Version is not a dev version or no new commits. Skipping tag."
linux-wheels: linux-wheels:
name: Build the linux nightly wheels name: Build the linux nightly wheels
runs-on: [self-hosted, linux-build] runs-on: [self-hosted, linux-build]
...@@ -97,7 +78,6 @@ jobs: ...@@ -97,7 +78,6 @@ jobs:
matrix: matrix:
python: ["3.10", "3.11", "3.12"] python: ["3.10", "3.11", "3.12"]
torch: ["2.5", "2.6", "2.7"] torch: ["2.5", "2.6", "2.7"]
steps: steps:
- name: Checkout to the tag - name: Checkout to the tag
uses: actions/checkout@v4 uses: actions/checkout@v4
...@@ -105,10 +85,8 @@ jobs: ...@@ -105,10 +85,8 @@ jobs:
fetch-depth: 0 fetch-depth: 0
ref: ${{ needs.tag.outputs.tag_name }} ref: ${{ needs.tag.outputs.tag_name }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Build wheels - name: Build wheels
run: | run: |
if [[ "${{ matrix.torch }}" == "2.7" ]]; then if [[ "${{ matrix.torch }}" == "2.7" ]]; then
...@@ -117,7 +95,6 @@ jobs: ...@@ -117,7 +95,6 @@ jobs:
cuda_version="12.4" cuda_version="12.4"
fi fi
bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version
- name: Upload wheels to GitHub Release - name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
...@@ -127,21 +104,18 @@ jobs: ...@@ -127,21 +104,18 @@ jobs:
prerelease: true prerelease: true
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Clean up - name: Clean up
if: always() && github.repository == 'mit-han-lab/nunchaku' if: always() && github.repository == 'mit-han-lab/nunchaku'
run: bash scripts/linux_cleanup.sh run: bash scripts/linux_cleanup.sh
windows-wheels: windows-wheels:
name: Build the windows nightly wheels name: Build the windows nightly wheels
runs-on: [ self-hosted, windows-build ] runs-on: [self-hosted, windows-build]
needs: tag needs: tag
if: needs.tag.outputs.need_build == 'true' && github.repository == 'mit-han-lab/nunchaku' if: needs.tag.outputs.need_build == 'true' && github.repository == 'mit-han-lab/nunchaku'
strategy: strategy:
matrix: matrix:
python: [ "3.10", "3.11", "3.12" ] python: ["3.10", "3.11", "3.12"]
torch: [ "2.5", "2.6", "2.7" ] torch: ["2.5", "2.6", "2.7"]
steps: steps:
- name: Checkout to the tag - name: Checkout to the tag
uses: actions/checkout@v4 uses: actions/checkout@v4
...@@ -149,10 +123,8 @@ jobs: ...@@ -149,10 +123,8 @@ jobs:
fetch-depth: 0 fetch-depth: 0
ref: ${{ needs.tag.outputs.tag_name }} ref: ${{ needs.tag.outputs.tag_name }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Build wheels - name: Build wheels
shell: cmd shell: cmd
run: | run: |
...@@ -164,7 +136,6 @@ jobs: ...@@ -164,7 +136,6 @@ jobs:
) )
call C:\Users\muyangl\miniconda3\condabin\activate.bat activate call C:\Users\muyangl\miniconda3\condabin\activate.bat activate
call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION% call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION%
- name: Upload wheels to GitHub Release - name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
......
name: Release Build name: Release Build
on: on:
workflow_dispatch: workflow_dispatch:
permissions: permissions:
contents: write contents: write
jobs: jobs:
release: release:
name: Tag Main Branch and Create Release name: Tag Main Branch and Create Release
...@@ -19,14 +16,12 @@ jobs: ...@@ -19,14 +16,12 @@ jobs:
with: with:
fetch-depth: 0 fetch-depth: 0
ref: main ref: main
- name: Extract version from __version__.py - name: Extract version from __version__.py
id: version id: version
run: | run: |
version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/') version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/')
echo "Extracted version: $version" echo "Extracted version: $version"
echo "version=$version" >> "$GITHUB_OUTPUT" echo "version=$version" >> "$GITHUB_OUTPUT"
- name: Create and push tag - name: Create and push tag
id: tag id: tag
run: | run: |
...@@ -36,7 +31,6 @@ jobs: ...@@ -36,7 +31,6 @@ jobs:
git tag $tag_name git tag $tag_name
git push origin $tag_name git push origin $tag_name
echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT" echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT"
linux-wheels: linux-wheels:
name: Build the linux release wheels name: Build the linux release wheels
runs-on: [self-hosted, linux-build] runs-on: [self-hosted, linux-build]
...@@ -45,7 +39,6 @@ jobs: ...@@ -45,7 +39,6 @@ jobs:
matrix: matrix:
python: ["3.10", "3.11", "3.12"] python: ["3.10", "3.11", "3.12"]
torch: ["2.5", "2.6", "2.7"] torch: ["2.5", "2.6", "2.7"]
steps: steps:
- name: Checkout to the tag - name: Checkout to the tag
uses: actions/checkout@v4 uses: actions/checkout@v4
...@@ -53,10 +46,8 @@ jobs: ...@@ -53,10 +46,8 @@ jobs:
fetch-depth: 0 fetch-depth: 0
ref: ${{ needs.release.outputs.tag_name }} ref: ${{ needs.release.outputs.tag_name }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Build wheels - name: Build wheels
run: | run: |
if [[ "${{ matrix.torch }}" == "2.7" ]]; then if [[ "${{ matrix.torch }}" == "2.7" ]]; then
...@@ -65,7 +56,6 @@ jobs: ...@@ -65,7 +56,6 @@ jobs:
cuda_version="12.4" cuda_version="12.4"
fi fi
bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version
- name: Upload wheels to GitHub Release - name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
...@@ -75,20 +65,17 @@ jobs: ...@@ -75,20 +65,17 @@ jobs:
prerelease: false prerelease: false
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Clean up - name: Clean up
if: always() if: always()
run: bash scripts/linux_cleanup.sh run: bash scripts/linux_cleanup.sh
windows-wheels: windows-wheels:
name: Build the windows release wheels name: Build the windows release wheels
runs-on: [ self-hosted, windows-build ] runs-on: [self-hosted, windows-build]
needs: release needs: release
strategy: strategy:
matrix: matrix:
python: [ "3.10", "3.11", "3.12" ] python: ["3.10", "3.11", "3.12"]
torch: [ "2.5", "2.6", "2.7" ] torch: ["2.5", "2.6", "2.7"]
steps: steps:
- name: Checkout to the tag - name: Checkout to the tag
uses: actions/checkout@v4 uses: actions/checkout@v4
...@@ -96,10 +83,8 @@ jobs: ...@@ -96,10 +83,8 @@ jobs:
fetch-depth: 0 fetch-depth: 0
ref: ${{ needs.release.outputs.tag_name }} ref: ${{ needs.release.outputs.tag_name }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Build wheels - name: Build wheels
shell: cmd shell: cmd
run: | run: |
...@@ -111,7 +96,6 @@ jobs: ...@@ -111,7 +96,6 @@ jobs:
) )
call C:\Users\muyangl\miniconda3\condabin\activate.bat activate call C:\Users\muyangl\miniconda3\condabin\activate.bat activate
call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION% call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION%
- name: Upload wheels to GitHub Release - name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
......
name: Synchronize to Private Repository name: Synchronize to Private Repository
on: on:
workflow_dispatch: workflow_dispatch:
push: push:
branches: branches:
- dev - dev
permissions: permissions:
contents: write contents: write
jobs: jobs:
cherry-pick-commits: cherry-pick-commits:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku' if: github.repository == 'mit-han-lab/nunchaku'
steps: steps:
- name: Clone private repository - name: Clone private repository
run: | run: |
git clone https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku-dev.git git clone https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku-dev.git
- name: Add public remote and fetch - name: Add public remote and fetch
run: | run: |
cd nunchaku-dev cd nunchaku-dev
git remote add public https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku.git git remote add public https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku.git
git fetch public dev git fetch public dev
- name: Cherry-pick latest commit from public/dev - name: Cherry-pick latest commit from public/dev
run: | run: |
set -e set -e
...@@ -94,7 +88,6 @@ jobs: ...@@ -94,7 +88,6 @@ jobs:
done done
git commit --amend --allow-empty -m "$NEW_MSG" --author="$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" git commit --amend --allow-empty -m "$NEW_MSG" --author="$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>"
- name: Push to the private main branch - name: Push to the private main branch
run: | run: |
cd nunchaku-dev cd nunchaku-dev
......
name: Ampere Tests name: Ampere Tests
on: on:
workflow_dispatch: workflow_dispatch:
inputs: inputs:
...@@ -10,11 +9,9 @@ on: ...@@ -10,11 +9,9 @@ on:
options: options:
- pr - pr
- branch - branch
pr_number: pr_number:
description: 'Pull Request Number (only if test_target == "pr")' description: 'Pull Request Number (only if test_target == "pr")'
required: false required: false
branch_name: branch_name:
description: 'Branch name (only if test_target == "branch")' description: 'Branch name (only if test_target == "branch")'
default: 'main' default: 'main'
...@@ -39,11 +36,10 @@ on: ...@@ -39,11 +36,10 @@ on:
concurrency: concurrency:
group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
check-comment: check-comment:
if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }} if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }}
runs-on: [ self-hosted, ampere ] runs-on: [self-hosted, ampere]
outputs: outputs:
should_run: ${{ steps.check.outputs.should_run }} should_run: ${{ steps.check.outputs.should_run }}
steps: steps:
...@@ -56,12 +52,10 @@ jobs: ...@@ -56,12 +52,10 @@ jobs:
else else
echo "should_run=false" >> $GITHUB_OUTPUT echo "should_run=false" >> $GITHUB_OUTPUT
fi fi
run-tests: run-tests:
runs-on: [ self-hosted, ampere ] runs-on: [self-hosted, ampere]
needs: [ check-comment ] needs: [check-comment]
if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }} if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }}
steps: steps:
- name: Determine ref - name: Determine ref
id: set-ref id: set-ref
...@@ -76,16 +70,13 @@ jobs: ...@@ -76,16 +70,13 @@ jobs:
with: with:
ref: ${{ steps.set-ref.outputs.ref }} ref: ${{ steps.set-ref.outputs.ref }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Set up Python - name: Set up Python
run: | run: |
which python which python
echo "Setting up Python with Conda" echo "Setting up Python with Conda"
conda create -n test_env python=3.11 -y conda create -n test_env python=3.11 -y
- name: Install dependencies - name: Install dependencies
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -95,7 +86,6 @@ jobs: ...@@ -95,7 +86,6 @@ jobs:
echo "Installing dependencies" echo "Installing dependencies"
pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128 pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31 pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31
- name: Build - name: Build
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -103,7 +93,6 @@ jobs: ...@@ -103,7 +93,6 @@ jobs:
which python which python
NUNCHAKU_INSTALL_MODE=ALL python setup.py develop NUNCHAKU_INSTALL_MODE=ALL python setup.py develop
pip install -r tests/requirements.txt pip install -r tests/requirements.txt
- name: Setup ComfyUI - name: Setup ComfyUI
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -127,7 +116,6 @@ jobs: ...@@ -127,7 +116,6 @@ jobs:
pip install -r nunchaku_tests/requirements.txt pip install -r nunchaku_tests/requirements.txt
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py
- name: Run ComfyUI tests - name: Run ComfyUI tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -136,7 +124,6 @@ jobs: ...@@ -136,7 +124,6 @@ jobs:
cd ../ComfyUI cd ../ComfyUI
python nunchaku_tests/scripts/nunchaku_flux1_dev.py python nunchaku_tests/scripts/nunchaku_flux1_dev.py
pytest -v nunchaku_tests/ pytest -v nunchaku_tests/
- name: Nunchaku FLUX memory tests - name: Nunchaku FLUX memory tests
run: | run: |
pwd pwd
...@@ -144,28 +131,24 @@ jobs: ...@@ -144,28 +131,24 @@ jobs:
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py
- name: Nunchaku FLUX example tests - name: Nunchaku FLUX example tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py
- name: Nunchaku FLUX other tests - name: Nunchaku FLUX other tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py
- name: Nunchaku SANA tests - name: Nunchaku SANA tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana
- name: clean up - name: clean up
if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true') if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true')
run: | run: |
......
name: Blackwell Tests name: Blackwell Tests
on: on:
workflow_dispatch: workflow_dispatch:
inputs: inputs:
...@@ -10,11 +9,9 @@ on: ...@@ -10,11 +9,9 @@ on:
options: options:
- pr - pr
- branch - branch
pr_number: pr_number:
description: 'Pull Request Number (only if test_target == "pr")' description: 'Pull Request Number (only if test_target == "pr")'
required: false required: false
branch_name: branch_name:
description: 'Branch name (only if test_target == "branch")' description: 'Branch name (only if test_target == "branch")'
default: 'main' default: 'main'
...@@ -39,11 +36,10 @@ on: ...@@ -39,11 +36,10 @@ on:
concurrency: concurrency:
group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
check-comment: check-comment:
if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }} if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }}
runs-on: [ self-hosted, blackwell ] runs-on: [self-hosted, blackwell]
outputs: outputs:
should_run: ${{ steps.check.outputs.should_run }} should_run: ${{ steps.check.outputs.should_run }}
steps: steps:
...@@ -56,12 +52,10 @@ jobs: ...@@ -56,12 +52,10 @@ jobs:
else else
echo "should_run=false" >> $GITHUB_OUTPUT echo "should_run=false" >> $GITHUB_OUTPUT
fi fi
run-tests: run-tests:
runs-on: [ self-hosted, blackwell ] runs-on: [self-hosted, blackwell]
needs: [ check-comment ] needs: [check-comment]
if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }} if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }}
steps: steps:
- name: Determine ref - name: Determine ref
id: set-ref id: set-ref
...@@ -76,16 +70,13 @@ jobs: ...@@ -76,16 +70,13 @@ jobs:
with: with:
ref: ${{ steps.set-ref.outputs.ref }} ref: ${{ steps.set-ref.outputs.ref }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Set up Python - name: Set up Python
run: | run: |
which python which python
echo "Setting up Python with Conda" echo "Setting up Python with Conda"
conda create -n test_env python=3.11 -y conda create -n test_env python=3.11 -y
- name: Install dependencies - name: Install dependencies
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -95,7 +86,6 @@ jobs: ...@@ -95,7 +86,6 @@ jobs:
echo "Installing dependencies" echo "Installing dependencies"
pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128 pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31 pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31
- name: Build - name: Build
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -103,7 +93,6 @@ jobs: ...@@ -103,7 +93,6 @@ jobs:
which python which python
NUNCHAKU_INSTALL_MODE=ALL python setup.py develop NUNCHAKU_INSTALL_MODE=ALL python setup.py develop
pip install -r tests/requirements.txt pip install -r tests/requirements.txt
- name: Setup ComfyUI - name: Setup ComfyUI
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -127,7 +116,6 @@ jobs: ...@@ -127,7 +116,6 @@ jobs:
pip install -r nunchaku_tests/requirements.txt pip install -r nunchaku_tests/requirements.txt
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py
- name: Run ComfyUI tests - name: Run ComfyUI tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -136,7 +124,6 @@ jobs: ...@@ -136,7 +124,6 @@ jobs:
cd ../ComfyUI cd ../ComfyUI
python nunchaku_tests/scripts/nunchaku_flux1_dev.py python nunchaku_tests/scripts/nunchaku_flux1_dev.py
pytest -v nunchaku_tests/ pytest -v nunchaku_tests/
- name: Nunchaku FLUX memory tests - name: Nunchaku FLUX memory tests
run: | run: |
pwd pwd
...@@ -144,28 +131,24 @@ jobs: ...@@ -144,28 +131,24 @@ jobs:
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py
- name: Nunchaku FLUX example tests - name: Nunchaku FLUX example tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py
- name: Nunchaku FLUX other tests - name: Nunchaku FLUX other tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py
- name: Nunchaku SANA tests - name: Nunchaku SANA tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana
- name: clean up - name: clean up
if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true') if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true')
run: | run: |
......
# Adapted from https://github.com/sgl-project/sglang/blob/main/.pre-commit-config.yaml # Adapted from https://github.com/sgl-project/sglang/blob/main/.pre-commit-config.yaml
default_stages: [ pre-commit, pre-push, manual ] default_stages: [pre-commit, pre-push, manual]
repos: repos:
- repo: https://github.com/pre-commit/pre-commit-hooks - repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0 rev: v5.0.0
...@@ -10,7 +9,7 @@ repos: ...@@ -10,7 +9,7 @@ repos:
- id: trailing-whitespace - id: trailing-whitespace
- id: end-of-file-fixer - id: end-of-file-fixer
- id: check-yaml - id: check-yaml
args: [ --allow-multiple-documents ] args: [--allow-multiple-documents]
- id: check-toml - id: check-toml
- id: check-ast - id: check-ast
- id: check-added-large-files - id: check-added-large-files
...@@ -27,7 +26,7 @@ repos: ...@@ -27,7 +26,7 @@ repos:
rev: v0.11.2 rev: v0.11.2
hooks: hooks:
- id: ruff - id: ruff
args: [ --fixable=F401 ] args: [--fixable=F401]
files: ^(nunchaku/|examples/|tests/|app/) files: ^(nunchaku/|examples/|tests/|app/)
exclude: \.ipynb$ exclude: \.ipynb$
- repo: https://github.com/psf/black - repo: https://github.com/psf/black
...@@ -35,14 +34,14 @@ repos: ...@@ -35,14 +34,14 @@ repos:
hooks: hooks:
- id: black-jupyter - id: black-jupyter
- id: black - id: black
args: [ -l, "120" ] args: [-l, "120"]
files: ^(nunchaku/|examples/|tests/|app/) files: ^(nunchaku/|examples/|tests/|app/)
- repo: https://github.com/pre-commit/mirrors-clang-format - repo: https://github.com/pre-commit/mirrors-clang-format
rev: v20.1.3 rev: v20.1.3
hooks: hooks:
- id: clang-format - id: clang-format
types_or: [ c++, cuda ] types_or: [c++, cuda]
args: [ --style=file, --verbose ] args: [--style=file, --verbose]
- repo: https://github.com/kynan/nbstripout - repo: https://github.com/kynan/nbstripout
rev: 0.8.1 rev: 0.8.1
hooks: hooks:
...@@ -50,3 +49,12 @@ repos: ...@@ -50,3 +49,12 @@ repos:
args: args:
- '--keep-output' - '--keep-output'
- '--extra-keys=metadata.kernelspec metadata.language_info.version' - '--extra-keys=metadata.kernelspec metadata.language_info.version'
- repo: https://github.com/google/yamlfmt
rev: v0.17.0
hooks:
- id: yamlfmt
- repo: https://github.com/executablebooks/mdformat
rev: 0.7.22
hooks:
- id: mdformat
name: (Markdown) Format docs with mdformat
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
<img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img> <img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img>
</div> </div>
<h3 align="center"> <h3 align="center">
<a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a> <a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/Nunchaku-519fed7f9de94e"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
</h3> </h3>
<h3 align="center"> <h3 align="center">
...@@ -18,15 +18,11 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv ...@@ -18,15 +18,11 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv
- **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage. - **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
- **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development. - **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever! - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
- **[2025-03-17]** 🚀 Released NVFP4 4-bit [Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) and FLUX.1-tools and also upgraded the INT4 FLUX.1-tool models. Download and update your models from our [HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c) or [ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641) collections!
- **[2025-03-13]** 📦 Separate the ComfyUI node into a [standalone repository](https://github.com/mit-han-lab/ComfyUI-nunchaku) for easier installation and release node v0.1.6! Plus, [4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) is now fully supported!
- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 Released!** We've supported [4-bit text encoder and per-layer CPU offloading](#Low-Memory-Inference), reducing FLUX's minimum memory requirement to just **4 GiB** while maintaining a **2–3× speedup**. This update also fixes various issues related to resolution, LoRA, pin memory, and runtime stability. Check out the release notes for full details! - **[2025-03-07]** 🚀 **Nunchaku v0.1.4 Released!** We've supported [4-bit text encoder and per-layer CPU offloading](#Low-Memory-Inference), reducing FLUX's minimum memory requirement to just **4 GiB** while maintaining a **2–3× speedup**. This update also fixes various issues related to resolution, LoRA, pin memory, and runtime stability. Check out the release notes for full details!
- **[2025-02-20]** 🚀 We release the [pre-built wheels](https://huggingface.co/mit-han-lab/nunchaku) to simplify installation! Check [here](#Installation) for the guidance!
- **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online! - **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
- **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**! - **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**!
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out! - **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out!
<details> <details>
<summary>More</summary> <summary>More</summary>
...@@ -53,23 +49,24 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8 ...@@ -53,23 +49,24 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
#### Quantization Method -- SVDQuant #### Quantization Method -- SVDQuant
![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)Overview of SVDQuant. Stage1: Originally, both the activation $\boldsymbol{X}$ and weights $\boldsymbol{W}$ contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation $\hat{\boldsymbol{X}}$ and weights $\hat{\boldsymbol{W}}$. While $\hat{\boldsymbol{X}}$ becomes easier to quantize, $\hat{\boldsymbol{W}}$ now becomes more difficult. Stage 3: SVDQuant further decomposes $\hat{\boldsymbol{W}}$ into a low-rank component $\boldsymbol{L}_1\boldsymbol{L}_2$ and a residual $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision. ![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)Overview of SVDQuant. Stage1: Originally, both the activation $\\boldsymbol{X}$ and weights $\\boldsymbol{W}$ contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation $\\hat{\\boldsymbol{X}}$ and weights $\\hat{\\boldsymbol{W}}$. While $\\hat{\\boldsymbol{X}}$ becomes easier to quantize, $\\hat{\\boldsymbol{W}}$ now becomes more difficult. Stage 3: SVDQuant further decomposes $\\hat{\\boldsymbol{W}}$ into a low-rank component $\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ and a residual $\\hat{\\boldsymbol{W}}-\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
#### Nunchaku Engine Design #### Nunchaku Engine Design
![engine](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together. ![engine](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.
## Performance ## Performance
![efficiency](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU. ![efficiency](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU.
## Installation ## Installation
We provide tutorial videos to help you install and use Nunchaku on Windows, available in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee). You can also follow the corresponding step-by-step text guide at [`docs/setup_windows.md`](docs/setup_windows.md). If you run into issues, these resources are a good place to start. We provide tutorial videos to help you install and use Nunchaku on Windows, available in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee). You can also follow the corresponding step-by-step text guide at [`docs/setup_windows.md`](docs/setup_windows.md). If you run into issues, these resources are a good place to start.
### Wheels ### Wheels
#### Prerequisites #### Prerequisites
Before installation, ensure you have [PyTorch>=2.5](https://pytorch.org/) installed. For example, you can use the following command to install PyTorch 2.6: Before installation, ensure you have [PyTorch>=2.5](https://pytorch.org/) installed. For example, you can use the following command to install PyTorch 2.6:
```shell ```shell
...@@ -77,6 +74,7 @@ pip install torch==2.6 torchvision==0.21 torchaudio==2.6 ...@@ -77,6 +74,7 @@ pip install torch==2.6 torchvision==0.21 torchaudio==2.6
``` ```
#### Install nunchaku #### Install nunchaku
Once PyTorch is installed, you can directly install `nunchaku` from [Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main), [ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku) or [GitHub release](https://github.com/mit-han-lab/nunchaku/releases). Be sure to select the appropriate wheel for your Python and PyTorch version. For example, for Python 3.11 and PyTorch 2.6: Once PyTorch is installed, you can directly install `nunchaku` from [Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main), [ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku) or [GitHub release](https://github.com/mit-han-lab/nunchaku/releases). Be sure to select the appropriate wheel for your Python and PyTorch version. For example, for Python 3.11 and PyTorch 2.6:
```shell ```shell
...@@ -111,12 +109,11 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT ...@@ -111,12 +109,11 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
**Note**: **Note**:
* Make sure your CUDA version is **at least 12.2 on Linux** and **at least 12.6 on Windows**. If you're using a Blackwell GPU (e.g., 50-series GPUs), CUDA **12.8 or higher is required**. - Make sure your CUDA version is **at least 12.2 on Linux** and **at least 12.6 on Windows**. If you're using a Blackwell GPU (e.g., 50-series GPUs), CUDA **12.8 or higher is required**.
* For Windows users, please refer to [this issue](https://github.com/mit-han-lab/nunchaku/issues/6) for the instruction. Please upgrade your MSVC compiler to the latest version.
* We currently support only NVIDIA GPUs with architectures sm_75 (Turing: RTX 2080), sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See [this issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details. - For Windows users, please refer to [this issue](https://github.com/mit-han-lab/nunchaku/issues/6) for the instruction. Please upgrade your MSVC compiler to the latest version.
- We currently support only NVIDIA GPUs with architectures sm_75 (Turing: RTX 2080), sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See [this issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
1. Install dependencies: 1. Install dependencies:
...@@ -136,7 +133,7 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT ...@@ -136,7 +133,7 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
``` ```
2. Install `nunchaku` package: 1. Install `nunchaku` package:
Make sure you have `gcc/g++>=11`. If you don't, you can install it via Conda on Linux: Make sure you have `gcc/g++>=11`. If you don't, you can install it via Conda on Linux:
```shell ```shell
...@@ -285,14 +282,14 @@ Please refer to [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/Co ...@@ -285,14 +282,14 @@ Please refer to [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/Co
## Gradio Demos ## Gradio Demos
* FLUX.1 Models - FLUX.1 Models
* Text-to-image: see [`app/flux.1/t2i`](app/flux.1/t2i). - Text-to-image: see [`app/flux.1/t2i`](app/flux.1/t2i).
* Sketch-to-Image ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)): see [`app/flux.1/sketch`](app/flux.1/sketch). - Sketch-to-Image ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)): see [`app/flux.1/sketch`](app/flux.1/sketch).
* Depth/Canny-to-Image ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)): see [`app/flux.1/depth_canny`](app/flux.1/depth_canny). - Depth/Canny-to-Image ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)): see [`app/flux.1/depth_canny`](app/flux.1/depth_canny).
* Inpainting ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)): see [`app/flux.1/fill`](app/flux.1/fill). - Inpainting ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)): see [`app/flux.1/fill`](app/flux.1/fill).
* Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)): see [`app/flux.1/redux`](app/flux.1/redux). - Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)): see [`app/flux.1/redux`](app/flux.1/redux).
* SANA: - SANA:
* Text-to-image: see [`app/sana/t2i`](app/sana/t2i). - Text-to-image: see [`app/sana/t2i`](app/sana/t2i).
## Customized Model Quantization ## Customized Model Quantization
...@@ -307,6 +304,7 @@ Please refer to [app/flux/t2i/README.md](app/flux/t2i/README.md) for instruction ...@@ -307,6 +304,7 @@ Please refer to [app/flux/t2i/README.md](app/flux/t2i/README.md) for instruction
Please check [here](https://github.com/mit-han-lab/nunchaku/issues/266) for the roadmap for April. Please check [here](https://github.com/mit-han-lab/nunchaku/issues/266) for the roadmap for April.
## Contribution ## Contribution
We warmly welcome contributions from the community! To get started, please refer to our [contribution guide](docs/contribution_guide.md) for instructions on how to contribute code to Nunchaku. We warmly welcome contributions from the community! To get started, please refer to our [contribution guide](docs/contribution_guide.md) for instructions on how to contribute code to Nunchaku.
## Troubleshooting ## Troubleshooting
...@@ -319,13 +317,13 @@ For enterprises interested in adopting SVDQuant or Nunchaku, including technical ...@@ -319,13 +317,13 @@ For enterprises interested in adopting SVDQuant or Nunchaku, including technical
## Related Projects ## Related Projects
* [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023 - [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
* [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023 - [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023
* [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023 - [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
* [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024 - [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
* [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024 - [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
* [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025 - [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
* [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025 - [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
## Citation ## Citation
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
<img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img> <img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img>
</div> </div>
<h3 align="center"> <h3 align="center">
<a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a> <a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/Nunchaku-519fed7f9de94e"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
</h3> </h3>
<h3 align="center"> <h3 align="center">
...@@ -18,13 +18,10 @@ ...@@ -18,13 +18,10 @@
- **[2025-04-09]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频,协助安装和使用Nunchaku。 - **[2025-04-09]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频,协助安装和使用Nunchaku。
- **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262),帮助社区快速上手并了解Nunchaku最新进展。 - **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262),帮助社区快速上手并了解Nunchaku最新进展。
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布!** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py),通过[**FP16 attention**](#fp16-attention)[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py),覆盖更多用户! - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布!** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py),通过[**FP16 attention**](#fp16-attention)[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py),覆盖更多用户!
- **[2025-03-17]** 🚀 发布NVFP4 4-bit量化版[Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)和FLUX.1工具集,升级INT4 FLUX.1工具模型。从[HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c)[ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641)下载更新!
- **[2025-03-13]** 📦 ComfyUI节点[独立仓库](https://github.com/mit-han-lab/ComfyUI-nunchaku)发布,安装更便捷!节点版本v0.1.6上线,全面支持[4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)
- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 发布!** 支持4-bit文本编码器和分层CPU offloading,FLUX最低显存需求降至**4 GiB**,同时保持**2–3倍加速**。修复分辨率、LoRA、内存锁定等稳定性问题,详情见更新日志! - **[2025-03-07]** 🚀 **Nunchaku v0.1.4 发布!** 支持4-bit文本编码器和分层CPU offloading,FLUX最低显存需求降至**4 GiB**,同时保持**2–3倍加速**。修复分辨率、LoRA、内存锁定等稳定性问题,详情见更新日志!
- **[2025-02-20]** 🚀 发布[预编译wheel包](https://huggingface.co/mit-han-lab/nunchaku),简化安装步骤!查看[安装指南](#安装指南)
- **[2025-02-20]** 🚀 **NVIDIA RTX 5090支持NVFP4精度!** 相比INT4,NVFP4画质更优,在RTX 5090上比BF16快**约3倍**[博客详解](https://hanlab.mit.edu/blog/svdquant-nvfp4)[示例代码](./examples)[在线演示](https://svdquant.mit.edu/flux1-schnell/)已上线! - **[2025-02-20]** 🚀 **NVIDIA RTX 5090支持NVFP4精度!** 相比INT4,NVFP4画质更优,在RTX 5090上比BF16快**约3倍**[博客详解](https://hanlab.mit.edu/blog/svdquant-nvfp4)[示例代码](./examples)[在线演示](https://svdquant.mit.edu/flux1-schnell/)已上线!
- **[2025-02-18]** 🔥 新增[自定义LoRA转换](#自定义lora)[模型量化](#自定义模型量化)指南![ComfyUI](./comfyui)工作流支持**自定义LoRA****FLUX.1工具集** - **[2025-02-18]** 🔥 新增[自定义LoRA转换](#%E8%87%AA%E5%AE%9A%E4%B9%89lora)[模型量化](#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B%E9%87%8F%E5%8C%96)指南![ComfyUI](./comfyui)工作流支持**自定义LoRA****FLUX.1工具集**
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007)入选ICLR 2025 Spotlight!FLUX.1工具集使用演示上线!** [使用演示](#使用演示)已更新![深度图生成演示](https://svdquant.mit.edu/flux1-depth-dev/)同步开放! - **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007)入选ICLR 2025 Spotlight!FLUX.1工具集使用演示上线!** [使用演示](#%E4%BD%BF%E7%94%A8%E6%BC%94%E7%A4%BA)已更新![深度图生成演示](https://svdquant.mit.edu/flux1-depth-dev/)同步开放!
<details> <details>
<summary>更多动态</summary> <summary>更多动态</summary>
...@@ -52,7 +49,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8 ...@@ -52,7 +49,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
#### 量化方法 -- SVDQuant #### 量化方法 -- SVDQuant
![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)SVDQuant三阶段示意图。阶段1:原始激活 $\boldsymbol{X}$ 和权重 $\boldsymbol{W}$ 均含异常值,4-bit量化困难。阶段2:将激活异常值迁移至权重,得到更易量化的激活 $\hat{\boldsymbol{X}}$ 和更难量化的权重 $\hat{\boldsymbol{W}}$ 。阶段3:通过SVD将 $\hat{\boldsymbol{W}}$ 分解为低秩分量 $\boldsymbol{L}_1\boldsymbol{L}_2$ 和残差 $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ ,低秩分支以16位精度运行缓解量化难度。 ![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)SVDQuant三阶段示意图。阶段1:原始激活 $\\boldsymbol{X}$ 和权重 $\\boldsymbol{W}$ 均含异常值,4-bit量化困难。阶段2:将激活异常值迁移至权重,得到更易量化的激活 $\\hat{\\boldsymbol{X}}$ 和更难量化的权重 $\\hat{\\boldsymbol{W}}$ 。阶段3:通过SVD将 $\\hat{\\boldsymbol{W}}$ 分解为低秩分量 $\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ 和残差 $\\hat{\\boldsymbol{W}}-\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ ,低秩分支以16位精度运行缓解量化难度。
#### Nunchaku引擎设计 #### Nunchaku引擎设计
...@@ -69,6 +66,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8 ...@@ -69,6 +66,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
### Wheel包安装 ### Wheel包安装
#### 前置条件 #### 前置条件
确保已安装 [PyTorch>=2.5](https://pytorch.org/)。例如: 确保已安装 [PyTorch>=2.5](https://pytorch.org/)。例如:
```shell ```shell
...@@ -76,6 +74,7 @@ pip install torch==2.6 torchvision==0.21 torchaudio==2.6 ...@@ -76,6 +74,7 @@ pip install torch==2.6 torchvision==0.21 torchaudio==2.6
``` ```
#### 安装nunchaku #### 安装nunchaku
[Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main)[ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku)[GitHub release](https://github.com/mit-han-lab/nunchaku/releases)选择对应Python和PyTorch版本的wheel。例如Python 3.11和PyTorch 2.6: [Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main)[ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku)[GitHub release](https://github.com/mit-han-lab/nunchaku/releases)选择对应Python和PyTorch版本的wheel。例如Python 3.11和PyTorch 2.6:
```shell ```shell
...@@ -110,9 +109,9 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0. ...@@ -110,9 +109,9 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
**注意** **注意**
* Linux需CUDA≥12.2,Windows需CUDA≥12.6。Blackwell显卡需CUDA≥12.8。 - Linux需CUDA≥12.2,Windows需CUDA≥12.6。Blackwell显卡需CUDA≥12.8。
* Windows用户请参考[此问题](https://github.com/mit-han-lab/nunchaku/issues/6)升级MSVC编译器。 - Windows用户请参考[此问题](https://github.com/mit-han-lab/nunchaku/issues/6)升级MSVC编译器。
* 支持SM_75(Turing:RTX 2080)、SM_86(Ampere:RTX 3090)、SM_89(Ada:RTX 4090)、SM_80(A100)架构显卡,详见[此问题](https://github.com/mit-han-lab/nunchaku/issues/1) - 支持SM_75(Turing:RTX 2080)、SM_86(Ampere:RTX 3090)、SM_89(Ada:RTX 4090)、SM_80(A100)架构显卡,详见[此问题](https://github.com/mit-han-lab/nunchaku/issues/1)
1. 安装依赖: 1. 安装依赖:
...@@ -132,7 +131,7 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0. ...@@ -132,7 +131,7 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
``` ```
2. 编译安装: 1. 编译安装:
确保`gcc/g++≥11`。Linux用户可通过Conda安装: 确保`gcc/g++≥11`。Linux用户可通过Conda安装:
```shell ```shell
...@@ -179,7 +178,7 @@ image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, g ...@@ -179,7 +178,7 @@ image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, g
image.save(f"flux.1-dev-{precision}.png") image.save(f"flux.1-dev-{precision}.png")
``` ```
**注意****Turing显卡用户(如20系列)**需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块,完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py) **注意**\*\*Turing显卡用户(如20系列)\*\*需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块,完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py)
### FP16 Attention ### FP16 Attention
...@@ -281,14 +280,14 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL ...@@ -281,14 +280,14 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
## 使用演示 ## 使用演示
* FLUX.1 模型 - FLUX.1 模型
* 文生图:见 [`app/flux.1/t2i`](app/flux.1/t2i) - 文生图:见 [`app/flux.1/t2i`](app/flux.1/t2i)
* 草图生成图像 ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)):见 [`app/flux.1/sketch`](app/flux.1/sketch) - 草图生成图像 ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)):见 [`app/flux.1/sketch`](app/flux.1/sketch)
* 深度/Canny 边缘生成图像 ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)):见 [`app/flux.1/depth_canny`](app/flux.1/depth_canny) - 深度/Canny 边缘生成图像 ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)):见 [`app/flux.1/depth_canny`](app/flux.1/depth_canny)
* 修复 ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)):见 [`app/flux.1/fill`](app/flux.1/fill) - 修复 ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)):见 [`app/flux.1/fill`](app/flux.1/fill)
* Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)):见 [`app/flux.1/redux`](app/flux.1/redux) - Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)):见 [`app/flux.1/redux`](app/flux.1/redux)
* SANA: - SANA:
* 文生图:见 [`app/sana/t2i`](app/sana/t2i) - 文生图:见 [`app/sana/t2i`](app/sana/t2i)
## 自定义模型量化 ## 自定义模型量化
...@@ -303,6 +302,7 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL ...@@ -303,6 +302,7 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
请查看 [此处](https://github.com/mit-han-lab/nunchaku/issues/266) 获取四月的路线图。 请查看 [此处](https://github.com/mit-han-lab/nunchaku/issues/266) 获取四月的路线图。
## 贡献 ## 贡献
我们诚挚欢迎社区贡献!请参阅[贡献指南](docs/contribution_guide_ZH.md)了解如何为 Nunchaku 贡献代码。 我们诚挚欢迎社区贡献!请参阅[贡献指南](docs/contribution_guide_ZH.md)了解如何为 Nunchaku 贡献代码。
## 问题排查 ## 问题排查
...@@ -315,13 +315,13 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL ...@@ -315,13 +315,13 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
## 相关项目 ## 相关项目
* [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023 - [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
* [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023 - [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023
* [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023 - [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
* [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024 - [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
* [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024 - [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
* [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025 - [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
* [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025 - [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
## 引用 ## 引用
......
import json
from pathlib import Path
import yaml
from safetensors.torch import save_file
from tqdm import tqdm
from nunchaku.utils import load_state_dict_in_safetensors
def load_yaml(path: str | Path) -> dict:
with open(path, "r", encoding="utf-8") as file:
data = yaml.safe_load(file)
return data
if __name__ == "__main__":
# data = load_yaml("nunchaku_models.yaml")
# for model in tqdm(data["diffusion_models"]):
# for precision in ["int4", "fp4"]:
# repo_id = model["repo_id"]
# filename = model["filename"].format(precision=precision)
# sd, metadata = load_state_dict_in_safetensors(Path(repo_id) / filename, return_metadata=True)
# metadata["model_class"] = "NunchakuFluxTransformer2dModel"
# quantization_config = {
# "method": "svdquant",
# "weight": {
# "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
# "scale_dtype": [None, "fp8_e4m3_nan"] if precision == "fp4" else None,
# "group_size": 16 if precision == "fp4" else 64,
# },
# "activation": {
# "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
# "scale_dtype": "fp8_e4m3_nan" if precision == "fp4" else None,
# "group_size": 16 if precision == "fp4" else 64,
# },
# }
# metadata["quantization_config"] = json.dumps(quantization_config)
# output_dir = Path("nunchaku-models") / Path(repo_id).name
# output_dir.mkdir(parents=True, exist_ok=True)
# save_file(sd, output_dir / filename, metadata=metadata)
# sd, metadata = load_state_dict_in_safetensors(
# "mit-han-lab/nunchaku-t5/awq-int4-flux.1-t5xxl.safetensors", return_metadata=True
# )
# metadata["model_class"] = "NunchakuT5EncoderModel"
# quantization_config = {"method": "awq", "weight": {"dtype": "int4", "scale_dtype": None, "group_size": 128}}
# output_dir = Path("nunchaku-models") / "nunchaku-t5"
# output_dir.mkdir(parents=True, exist_ok=True)
# save_file(sd, output_dir / "awq-int4-flux.1-t5xxl.safetensors", metadata=metadata)
sd, metadata = load_state_dict_in_safetensors(
"mit-han-lab/nunchaku-sana/svdq-int4_r32-sana1.6b.safetensors", return_metadata=True
)
metadata["model_class"] = "NunchakuSanaTransformer2DModel"
precision = "int4"
quantization_config = {
"method": "svdquant",
"weight": {
"dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
"scale_dtype": [None, "fp8_e4m3_nan"] if precision == "fp4" else None,
"group_size": 16 if precision == "fp4" else 64,
},
"activation": {
"dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
"scale_dtype": "fp8_e4m3_nan" if precision == "fp4" else None,
"group_size": 16 if precision == "fp4" else 64,
},
}
output_dir = Path("nunchaku-models") / "nunchaku-sana"
output_dir.mkdir(parents=True, exist_ok=True)
save_file(sd, output_dir / "svdq-int4_r32-sana1.6b.safetensors", metadata=metadata)
...@@ -6,8 +6,8 @@ This interactive Gradio application transforms your uploaded image into a differ ...@@ -6,8 +6,8 @@ This interactive Gradio application transforms your uploaded image into a differ
The base models are: The base models are:
* [FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) (preserves depth map) - [FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) (preserves depth map)
* [FLUX.1-Canny-dev](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) (preserves Canny edge) - [FLUX.1-Canny-dev](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) (preserves Canny edge)
First you need to install some dependencies: First you need to install some dependencies:
...@@ -22,7 +22,7 @@ Then run: ...@@ -22,7 +22,7 @@ Then run:
python run_gradio.py python run_gradio.py
``` ```
* By default, the model is `FLUX.1-Depth-dev`. You can add `-m canny` to switch to `FLUX.1-Canny-dev`. - By default, the model is `FLUX.1-Depth-dev`. You can add `-m canny` to switch to `FLUX.1-Canny-dev`.
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`. - The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`. - To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model. - By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
...@@ -8,6 +8,6 @@ This interactive Gradio application allows you to interactively inpaint an uploa ...@@ -8,6 +8,6 @@ This interactive Gradio application allows you to interactively inpaint an uploa
python run_gradio.py python run_gradio.py
``` ```
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`. - The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`. - To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model. - By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
...@@ -8,4 +8,4 @@ This interactive Gradio application allows you to interactively generate image v ...@@ -8,4 +8,4 @@ This interactive Gradio application allows you to interactively generate image v
python run_gradio.py python run_gradio.py
``` ```
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model. - By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
...@@ -10,6 +10,6 @@ To launch the application, simply run: ...@@ -10,6 +10,6 @@ To launch the application, simply run:
python run_gradio.py python run_gradio.py
``` ```
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`. - The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`. - To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model. - By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment