Unverified Commit e2116d7b authored by Muyang Li's avatar Muyang Li Committed by GitHub
Browse files

chore: release v0.3.0

parents 6098c419 d94c2078
BasedOnStyle: LLVM # K&R / "attach" braces like the code now BasedOnStyle: LLVM # K&R / "attach" braces like the code now
IndentWidth: 4 # 4‑space indents everywhere IndentWidth: 4 # 4‑space indents everywhere
TabWidth: 4 TabWidth: 4
UseTab: Never # never convert to tabs UseTab: Never # never convert to tabs
ColumnLimit: 120 ColumnLimit: 120
AccessModifierOffset: -4 AccessModifierOffset: -4
BreakBeforeBraces: Attach # `void foo() {` — brace on same line
BreakBeforeBraces: Attach # `void foo() {` — brace on same line
BraceWrapping: BraceWrapping:
AfterNamespace: false # `namespace x {` on same line AfterNamespace: false # `namespace x {` on same line
SplitEmptyFunction: false SplitEmptyFunction: false
SplitEmptyRecord: false SplitEmptyRecord: false
SplitEmptyNamespace: false SplitEmptyNamespace: false
PointerAlignment: Right # `int *ptr`, `const Foo *bar`
PointerAlignment: Right # `int *ptr`, `const Foo *bar` ReferenceAlignment: Pointer # `int &ref` -> same rule as pointers
ReferenceAlignment: Pointer # `int &ref` -> same rule as pointers SortIncludes: false # keep the hand‑crafted include order
SortIncludes: false # keep the hand‑crafted include order
IncludeBlocks: Preserve IncludeBlocks: Preserve
SortUsingDeclarations: false SortUsingDeclarations: false
IndentPPDirectives: None # keep `#pragma` / `#if` at column 0
IndentPPDirectives: None # keep `#pragma` / `#if` at column 0
AllowShortFunctionsOnASingleLine: Empty AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: false AllowShortIfStatementsOnASingleLine: false
AllowShortBlocksOnASingleLine: false AllowShortBlocksOnASingleLine: false
BinPackParameters: false # one parameter per line (as written) BinPackParameters: false # one parameter per line (as written)
BinPackArguments: false BinPackArguments: false
AlignAfterOpenBracket: Align # preserve the current hanging‑indent style
AlignAfterOpenBracket: Align # preserve the current hanging‑indent style AlignConsecutiveAssignments: true
AlignConsecutiveAssignments: true
AlignConsecutiveDeclarations: false AlignConsecutiveDeclarations: false
SpaceAfterTemplateKeyword: false SpaceAfterTemplateKeyword: false
BreakTemplateDeclarations: Yes BreakTemplateDeclarations: Yes
...@@ -3,37 +3,36 @@ name: 🐞 Bug report ...@@ -3,37 +3,36 @@ name: 🐞 Bug report
description: Create a report to help us reproduce and fix the bug description: Create a report to help us reproduce and fix the bug
title: "[Bug] " title: "[Bug] "
labels: ['Bug'] labels: ['Bug']
body: body:
- type: checkboxes - type: checkboxes
attributes: attributes:
label: Checklist label: Checklist
options: options:
- label: 1. I have searched for related issues and FAQs (https://github.com/mit-han-lab/nunchaku/blob/main/docs/faq.md) but was unable to find a solution. - label: 1. I have searched for related issues and FAQs (https://github.com/mit-han-lab/nunchaku/blob/main/docs/faq.md) but was unable to find a solution.
- label: 2. The issue persists in the latest version. - label: 2. The issue persists in the latest version.
- label: 3. Please note that without environment information and a minimal reproducible example, it will be difficult for us to reproduce and address the issue, which may delay our response. - label: 3. Please note that without environment information and a minimal reproducible example, it will be difficult for us to reproduce and address the issue, which may delay our response.
- label: 4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed. - label: 4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed.
- label: 5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues. - label: 5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues.
- label: 6. I will do my best to describe the issue in English. - label: 6. I will do my best to describe the issue in English.
- type: textarea - type: textarea
attributes: attributes:
label: Describe the Bug label: Describe the Bug
description: Provide a clear and concise explanation of the bug you encountered. description: Provide a clear and concise explanation of the bug you encountered.
validations: validations:
required: true required: true
- type: textarea - type: textarea
attributes: attributes:
label: Environment label: Environment
description: | description: |
Please include relevant environment details such as your system specifications, Python version, PyTorch version, and CUDA version. Please include relevant environment details such as your system specifications, Python version, PyTorch version, and CUDA version.
placeholder: "Example: Ubuntu 24.04, Python 3.11, PyTorch 2.6, CUDA 12.4" placeholder: "Example: Ubuntu 24.04, Python 3.11, PyTorch 2.6, CUDA 12.4"
validations: validations:
required: true required: true
- type: textarea - type: textarea
attributes: attributes:
label: Reproduction Steps label: Reproduction Steps
description: | description: |
What command or script did you execute? Which **model** were you using? What command or script did you execute? Which **model** were you using?
placeholder: "Example: python run_model.py --config config.json" placeholder: "Example: python run_model.py --config config.json"
validations: validations:
required: true required: true
...@@ -2,23 +2,22 @@ ...@@ -2,23 +2,22 @@
name: 🚀 Feature request name: 🚀 Feature request
description: Suggest an idea for this project description: Suggest an idea for this project
title: "[Feature] " title: "[Feature] "
body: body:
- type: checkboxes - type: checkboxes
attributes: attributes:
label: Checklist label: Checklist
options: options:
- label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed. - label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
- label: 2. I will do my best to describe the issue in English. - label: 2. I will do my best to describe the issue in English.
- type: textarea - type: textarea
attributes: attributes:
label: Motivation label: Motivation
description: | description: |
A clear and concise description of the motivation of the feature. A clear and concise description of the motivation of the feature.
validations: validations:
required: true required: true
- type: textarea - type: textarea
attributes: attributes:
label: Related resources label: Related resources
description: | description: |
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful. If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
name: Auto-merge main into dev name: Auto-merge main into dev
on: on:
workflow_dispatch: workflow_dispatch:
push: push:
branches: branches:
- main - main
permissions: permissions:
contents: write contents: write
jobs: jobs:
merge-main-into-dev: merge-main-into-dev:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku' if: github.repository == 'mit-han-lab/nunchaku'
steps: steps:
- name: Checkout the repository - name: Checkout the repository
uses: actions/checkout@v4 uses: actions/checkout@v4
with: with:
fetch-depth: 0 fetch-depth: 0
token: ${{ secrets.GH_TOKEN }} token: ${{ secrets.GH_TOKEN }}
- name: Check if main and dev are already in sync - name: Check if main and dev are already in sync
id: check_sync id: check_sync
run: | run: |
...@@ -36,7 +31,6 @@ jobs: ...@@ -36,7 +31,6 @@ jobs:
echo "Branches differ. Proceeding with merge." echo "Branches differ. Proceeding with merge."
echo "skip_merge=false" >> "$GITHUB_OUTPUT" echo "skip_merge=false" >> "$GITHUB_OUTPUT"
fi fi
- name: Merge main into dev - name: Merge main into dev
id: last_commit id: last_commit
if: steps.check_sync.outputs.skip_merge == 'false' if: steps.check_sync.outputs.skip_merge == 'false'
......
name: Clean Old Nightly Releases name: Clean Old Nightly Releases
on: on:
schedule: schedule:
- cron: '* 6 * * *' - cron: '* 6 * * *'
workflow_dispatch: workflow_dispatch:
permissions: permissions:
contents: write contents: write
jobs: jobs:
cleanup: cleanup:
name: Delete old nightly releases and tags name: Delete old nightly releases and tags
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku' if: github.repository == 'mit-han-lab/nunchaku'
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: List all nightly releases - name: List all nightly releases
id: list id: list
run: | run: |
...@@ -26,14 +21,12 @@ jobs: ...@@ -26,14 +21,12 @@ jobs:
echo "Found $(wc -l < nightly_tags.txt) nightly releases." echo "Found $(wc -l < nightly_tags.txt) nightly releases."
env: env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Trim to old tags beyond latest 30 - name: Trim to old tags beyond latest 30
id: filter id: filter
run: | run: |
tail -n +31 nightly_tags.txt > to_delete.txt || true tail -n +31 nightly_tags.txt > to_delete.txt || true
echo "Tags to delete:" echo "Tags to delete:"
cat to_delete.txt || echo "(none)" cat to_delete.txt || echo "(none)"
- name: Delete releases and tags - name: Delete releases and tags
run: | run: |
while read tag; do while read tag; do
...@@ -43,6 +36,5 @@ jobs: ...@@ -43,6 +36,5 @@ jobs:
done < to_delete.txt done < to_delete.txt
env: env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Done - name: Done
run: echo "Nightly cleanup completed." run: echo "Nightly cleanup completed."
# Borrowed from https://github.com/sgl-project/sglang/blob/main/.github/workflows/close-inactive-issues.yml # Borrowed from https://github.com/sgl-project/sglang/blob/main/.github/workflows/close-inactive-issues.yml
name: Close Inactive Issues name: Close Inactive Issues
on: on:
schedule: schedule:
- cron: '0 0 * * *' - cron: '0 0 * * *'
workflow_dispatch: workflow_dispatch:
permissions: permissions:
issues: write issues: write
contents: read contents: read
jobs: jobs:
close-inactive-issues: close-inactive-issues:
if: github.repository == 'mit-han-lab/nunchaku' if: github.repository == 'mit-han-lab/nunchaku'
......
name: Lint name: Lint
on: on:
push: push:
branches: branches:
- main - main
- dev - dev
pull_request: pull_request:
jobs: jobs:
lint: lint:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- name: Set up Python - name: Set up Python
uses: actions/setup-python@v4 uses: actions/setup-python@v4
with: with:
python-version: '3.11' python-version: '3.11'
- name: Install pre-commit hook - name: Install pre-commit hook
run: | run: |
python -m pip install pre-commit python -m pip install pre-commit
pre-commit install pre-commit install
- name: Linting - name: Linting
run: pre-commit run --all-files run: pre-commit run --all-files
name: Nightly Build name: Nightly Build
on: on:
schedule: schedule:
- cron: '0 8 * * *' # UTC time - cron: '0 8 * * *' # UTC time
workflow_dispatch: workflow_dispatch:
permissions: permissions:
contents: write contents: write
jobs: jobs:
tag: tag:
name: Tag dev branch if dev version name: Tag dev branch if dev version
...@@ -22,51 +19,38 @@ jobs: ...@@ -22,51 +19,38 @@ jobs:
with: with:
fetch-depth: 0 fetch-depth: 0
ref: dev ref: dev
- name: Extract version from __version__.py - name: Extract version from __version__.py
id: version id: version
run: | run: |
version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/') version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/')
echo "Extracted version: $version" echo "Extracted version: $version"
echo "version=$version" >> "$GITHUB_OUTPUT" echo "version=$version" >> "$GITHUB_OUTPUT"
- name: Determine if build is needed
- name: Check if version contains 'dev'
id: check id: check
run: | run: |
if [[ "${{ steps.version.outputs.version }}" == *dev* ]]; then version="${{ steps.version.outputs.version }}"
echo "need_build=true" >> "$GITHUB_OUTPUT" need_build=false
else if [[ "$version" == *dev* ]]; then
echo "need_build=false" >> "$GITHUB_OUTPUT" echo "Version contains 'dev'"
fi prefix="v$version"
tag=$(git tag --list "${prefix}*" --sort=-creatordate | head -n 1 || echo "")
- name: Get latest tag with same version prefix if [ -z "$tag" ]; then
id: last_tag echo "No previous tag found."
if: steps.check.outputs.need_build == 'true' need_build=true
run: |
prefix="v${{ steps.version.outputs.version }}"
tag=$(git tag --list "${prefix}*" --sort=-creatordate | head -n 1 || echo "")
echo "latest_tag=$tag" >> "$GITHUB_OUTPUT"
- name: Check if current commit is new
id: check_commit_diff
if: steps.check.outputs.need_build == 'true'
run: |
tag=${{ steps.last_tag.outputs.latest_tag }}
if [ -z "$tag" ]; then
echo "No previous tag found."
echo "need_build=true" >> "$GITHUB_OUTPUT"
else
base=$(git rev-parse "$tag")
head=$(git rev-parse HEAD)
if [ "$base" = "$head" ]; then
echo "No new commits since $tag"
echo "need_build=false" >> "$GITHUB_OUTPUT"
else else
echo "New commits found since $tag" base=$(git rev-parse "$tag")
echo "need_build=true" >> "$GITHUB_OUTPUT" head=$(git rev-parse HEAD)
if [ "$base" != "$head" ]; then
echo "New commits found since $tag"
need_build=true
else
echo "No new commits since $tag"
fi
fi fi
else
echo "Version does not contain 'dev'"
fi fi
echo "need_build=$need_build" >> "$GITHUB_OUTPUT"
- name: Set tag name - name: Set tag name
id: tag id: tag
if: steps.check.outputs.need_build == 'true' if: steps.check.outputs.need_build == 'true'
...@@ -75,7 +59,6 @@ jobs: ...@@ -75,7 +59,6 @@ jobs:
tag_name="v${{ steps.version.outputs.version }}$today" tag_name="v${{ steps.version.outputs.version }}$today"
echo "tag_name=$tag_name" echo "tag_name=$tag_name"
echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT" echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT"
- name: Create and push tag - name: Create and push tag
if: steps.check.outputs.need_build == 'true' if: steps.check.outputs.need_build == 'true'
run: | run: |
...@@ -83,11 +66,9 @@ jobs: ...@@ -83,11 +66,9 @@ jobs:
git config user.email "github-actions@users.noreply.github.com" git config user.email "github-actions@users.noreply.github.com"
git tag ${{ steps.tag.outputs.tag_name }} git tag ${{ steps.tag.outputs.tag_name }}
git push origin ${{ steps.tag.outputs.tag_name }} git push origin ${{ steps.tag.outputs.tag_name }}
- name: Skip tagging (version is not dev or no new commits) - name: Skip tagging (version is not dev or no new commits)
if: steps.check.outputs.need_build == 'false' if: steps.check.outputs.need_build == 'false'
run: echo "Version is not a dev version. Skipping tag." run: echo "Version is not a dev version or no new commits. Skipping tag."
linux-wheels: linux-wheels:
name: Build the linux nightly wheels name: Build the linux nightly wheels
runs-on: [self-hosted, linux-build] runs-on: [self-hosted, linux-build]
...@@ -97,7 +78,6 @@ jobs: ...@@ -97,7 +78,6 @@ jobs:
matrix: matrix:
python: ["3.10", "3.11", "3.12"] python: ["3.10", "3.11", "3.12"]
torch: ["2.5", "2.6", "2.7"] torch: ["2.5", "2.6", "2.7"]
steps: steps:
- name: Checkout to the tag - name: Checkout to the tag
uses: actions/checkout@v4 uses: actions/checkout@v4
...@@ -105,10 +85,8 @@ jobs: ...@@ -105,10 +85,8 @@ jobs:
fetch-depth: 0 fetch-depth: 0
ref: ${{ needs.tag.outputs.tag_name }} ref: ${{ needs.tag.outputs.tag_name }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Build wheels - name: Build wheels
run: | run: |
if [[ "${{ matrix.torch }}" == "2.7" ]]; then if [[ "${{ matrix.torch }}" == "2.7" ]]; then
...@@ -117,7 +95,6 @@ jobs: ...@@ -117,7 +95,6 @@ jobs:
cuda_version="12.4" cuda_version="12.4"
fi fi
bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version
- name: Upload wheels to GitHub Release - name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
...@@ -127,21 +104,18 @@ jobs: ...@@ -127,21 +104,18 @@ jobs:
prerelease: true prerelease: true
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Clean up - name: Clean up
if: always() && github.repository == 'mit-han-lab/nunchaku' if: always() && github.repository == 'mit-han-lab/nunchaku'
run: bash scripts/linux_cleanup.sh run: bash scripts/linux_cleanup.sh
windows-wheels: windows-wheels:
name: Build the windows nightly wheels name: Build the windows nightly wheels
runs-on: [ self-hosted, windows-build ] runs-on: [self-hosted, windows-build]
needs: tag needs: tag
if: needs.tag.outputs.need_build == 'true' && github.repository == 'mit-han-lab/nunchaku' if: needs.tag.outputs.need_build == 'true' && github.repository == 'mit-han-lab/nunchaku'
strategy: strategy:
matrix: matrix:
python: [ "3.10", "3.11", "3.12" ] python: ["3.10", "3.11", "3.12"]
torch: [ "2.5", "2.6", "2.7" ] torch: ["2.5", "2.6", "2.7"]
steps: steps:
- name: Checkout to the tag - name: Checkout to the tag
uses: actions/checkout@v4 uses: actions/checkout@v4
...@@ -149,10 +123,8 @@ jobs: ...@@ -149,10 +123,8 @@ jobs:
fetch-depth: 0 fetch-depth: 0
ref: ${{ needs.tag.outputs.tag_name }} ref: ${{ needs.tag.outputs.tag_name }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Build wheels - name: Build wheels
shell: cmd shell: cmd
run: | run: |
...@@ -164,7 +136,6 @@ jobs: ...@@ -164,7 +136,6 @@ jobs:
) )
call C:\Users\muyangl\miniconda3\condabin\activate.bat activate call C:\Users\muyangl\miniconda3\condabin\activate.bat activate
call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION% call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION%
- name: Upload wheels to GitHub Release - name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
......
name: Release Build name: Release Build
on: on:
workflow_dispatch: workflow_dispatch:
permissions: permissions:
contents: write contents: write
jobs: jobs:
release: release:
name: Tag Main Branch and Create Release name: Tag Main Branch and Create Release
...@@ -19,14 +16,12 @@ jobs: ...@@ -19,14 +16,12 @@ jobs:
with: with:
fetch-depth: 0 fetch-depth: 0
ref: main ref: main
- name: Extract version from __version__.py - name: Extract version from __version__.py
id: version id: version
run: | run: |
version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/') version=$(grep '__version__' nunchaku/__version__.py | sed -E 's/.*"([^"]+)".*/\1/')
echo "Extracted version: $version" echo "Extracted version: $version"
echo "version=$version" >> "$GITHUB_OUTPUT" echo "version=$version" >> "$GITHUB_OUTPUT"
- name: Create and push tag - name: Create and push tag
id: tag id: tag
run: | run: |
...@@ -36,7 +31,6 @@ jobs: ...@@ -36,7 +31,6 @@ jobs:
git tag $tag_name git tag $tag_name
git push origin $tag_name git push origin $tag_name
echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT" echo "tag_name=$tag_name" >> "$GITHUB_OUTPUT"
linux-wheels: linux-wheels:
name: Build the linux release wheels name: Build the linux release wheels
runs-on: [self-hosted, linux-build] runs-on: [self-hosted, linux-build]
...@@ -45,7 +39,6 @@ jobs: ...@@ -45,7 +39,6 @@ jobs:
matrix: matrix:
python: ["3.10", "3.11", "3.12"] python: ["3.10", "3.11", "3.12"]
torch: ["2.5", "2.6", "2.7"] torch: ["2.5", "2.6", "2.7"]
steps: steps:
- name: Checkout to the tag - name: Checkout to the tag
uses: actions/checkout@v4 uses: actions/checkout@v4
...@@ -53,10 +46,8 @@ jobs: ...@@ -53,10 +46,8 @@ jobs:
fetch-depth: 0 fetch-depth: 0
ref: ${{ needs.release.outputs.tag_name }} ref: ${{ needs.release.outputs.tag_name }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Build wheels - name: Build wheels
run: | run: |
if [[ "${{ matrix.torch }}" == "2.7" ]]; then if [[ "${{ matrix.torch }}" == "2.7" ]]; then
...@@ -65,7 +56,6 @@ jobs: ...@@ -65,7 +56,6 @@ jobs:
cuda_version="12.4" cuda_version="12.4"
fi fi
bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version bash scripts/build_linux_wheel.sh ${{ matrix.python }} ${{ matrix.torch }} $cuda_version
- name: Upload wheels to GitHub Release - name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
...@@ -75,20 +65,17 @@ jobs: ...@@ -75,20 +65,17 @@ jobs:
prerelease: false prerelease: false
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Clean up - name: Clean up
if: always() if: always()
run: bash scripts/linux_cleanup.sh run: bash scripts/linux_cleanup.sh
windows-wheels: windows-wheels:
name: Build the windows release wheels name: Build the windows release wheels
runs-on: [ self-hosted, windows-build ] runs-on: [self-hosted, windows-build]
needs: release needs: release
strategy: strategy:
matrix: matrix:
python: [ "3.10", "3.11", "3.12" ] python: ["3.10", "3.11", "3.12"]
torch: [ "2.5", "2.6", "2.7" ] torch: ["2.5", "2.6", "2.7"]
steps: steps:
- name: Checkout to the tag - name: Checkout to the tag
uses: actions/checkout@v4 uses: actions/checkout@v4
...@@ -96,10 +83,8 @@ jobs: ...@@ -96,10 +83,8 @@ jobs:
fetch-depth: 0 fetch-depth: 0
ref: ${{ needs.release.outputs.tag_name }} ref: ${{ needs.release.outputs.tag_name }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Build wheels - name: Build wheels
shell: cmd shell: cmd
run: | run: |
...@@ -111,7 +96,6 @@ jobs: ...@@ -111,7 +96,6 @@ jobs:
) )
call C:\Users\muyangl\miniconda3\condabin\activate.bat activate call C:\Users\muyangl\miniconda3\condabin\activate.bat activate
call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION% call scripts\build_windows_wheel.cmd ${{ matrix.python }} %TORCH_VERSION% %CUDA_VERSION%
- name: Upload wheels to GitHub Release - name: Upload wheels to GitHub Release
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
......
name: Synchronize to Private Repository name: Synchronize to Private Repository
on: on:
workflow_dispatch: workflow_dispatch:
push: push:
branches: branches:
- dev - dev
permissions: permissions:
contents: write contents: write
jobs: jobs:
cherry-pick-commits: cherry-pick-commits:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: github.repository == 'mit-han-lab/nunchaku' if: github.repository == 'mit-han-lab/nunchaku'
steps: steps:
- name: Clone private repository - name: Clone private repository
run: | run: |
git clone https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku-dev.git git clone https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku-dev.git
- name: Add public remote and fetch - name: Add public remote and fetch
run: | run: |
cd nunchaku-dev cd nunchaku-dev
git remote add public https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku.git git remote add public https://x-access-token:${{ secrets.GH_TOKEN }}@github.com/mit-han-lab/nunchaku.git
git fetch public dev git fetch public dev
- name: Cherry-pick latest commit from public/dev - name: Cherry-pick latest commit from public/dev
run: | run: |
set -e set -e
...@@ -94,7 +88,6 @@ jobs: ...@@ -94,7 +88,6 @@ jobs:
done done
git commit --amend --allow-empty -m "$NEW_MSG" --author="$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" git commit --amend --allow-empty -m "$NEW_MSG" --author="$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>"
- name: Push to the private main branch - name: Push to the private main branch
run: | run: |
cd nunchaku-dev cd nunchaku-dev
......
name: Ampere Tests name: Ampere Tests
on: on:
workflow_dispatch: workflow_dispatch:
inputs: inputs:
...@@ -10,11 +9,9 @@ on: ...@@ -10,11 +9,9 @@ on:
options: options:
- pr - pr
- branch - branch
pr_number: pr_number:
description: 'Pull Request Number (only if test_target == "pr")' description: 'Pull Request Number (only if test_target == "pr")'
required: false required: false
branch_name: branch_name:
description: 'Branch name (only if test_target == "branch")' description: 'Branch name (only if test_target == "branch")'
default: 'main' default: 'main'
...@@ -39,11 +36,10 @@ on: ...@@ -39,11 +36,10 @@ on:
concurrency: concurrency:
group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
check-comment: check-comment:
if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }} if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }}
runs-on: [ self-hosted, ampere ] runs-on: [self-hosted, ampere]
outputs: outputs:
should_run: ${{ steps.check.outputs.should_run }} should_run: ${{ steps.check.outputs.should_run }}
steps: steps:
...@@ -56,12 +52,10 @@ jobs: ...@@ -56,12 +52,10 @@ jobs:
else else
echo "should_run=false" >> $GITHUB_OUTPUT echo "should_run=false" >> $GITHUB_OUTPUT
fi fi
run-tests: run-tests:
runs-on: [ self-hosted, ampere ] runs-on: [self-hosted, ampere]
needs: [ check-comment ] needs: [check-comment]
if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }} if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }}
steps: steps:
- name: Determine ref - name: Determine ref
id: set-ref id: set-ref
...@@ -76,16 +70,13 @@ jobs: ...@@ -76,16 +70,13 @@ jobs:
with: with:
ref: ${{ steps.set-ref.outputs.ref }} ref: ${{ steps.set-ref.outputs.ref }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Set up Python - name: Set up Python
run: | run: |
which python which python
echo "Setting up Python with Conda" echo "Setting up Python with Conda"
conda create -n test_env python=3.11 -y conda create -n test_env python=3.11 -y
- name: Install dependencies - name: Install dependencies
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -95,7 +86,6 @@ jobs: ...@@ -95,7 +86,6 @@ jobs:
echo "Installing dependencies" echo "Installing dependencies"
pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128 pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31 pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31
- name: Build - name: Build
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -103,7 +93,6 @@ jobs: ...@@ -103,7 +93,6 @@ jobs:
which python which python
NUNCHAKU_INSTALL_MODE=ALL python setup.py develop NUNCHAKU_INSTALL_MODE=ALL python setup.py develop
pip install -r tests/requirements.txt pip install -r tests/requirements.txt
- name: Setup ComfyUI - name: Setup ComfyUI
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -127,7 +116,6 @@ jobs: ...@@ -127,7 +116,6 @@ jobs:
pip install -r nunchaku_tests/requirements.txt pip install -r nunchaku_tests/requirements.txt
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py
- name: Run ComfyUI tests - name: Run ComfyUI tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -136,7 +124,6 @@ jobs: ...@@ -136,7 +124,6 @@ jobs:
cd ../ComfyUI cd ../ComfyUI
python nunchaku_tests/scripts/nunchaku_flux1_dev.py python nunchaku_tests/scripts/nunchaku_flux1_dev.py
pytest -v nunchaku_tests/ pytest -v nunchaku_tests/
- name: Nunchaku FLUX memory tests - name: Nunchaku FLUX memory tests
run: | run: |
pwd pwd
...@@ -144,28 +131,24 @@ jobs: ...@@ -144,28 +131,24 @@ jobs:
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py
- name: Nunchaku FLUX example tests - name: Nunchaku FLUX example tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py
- name: Nunchaku FLUX other tests - name: Nunchaku FLUX other tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py
- name: Nunchaku SANA tests - name: Nunchaku SANA tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_AMPERE }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana
- name: clean up - name: clean up
if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true') if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true')
run: | run: |
......
name: Blackwell Tests name: Blackwell Tests
on: on:
workflow_dispatch: workflow_dispatch:
inputs: inputs:
...@@ -10,11 +9,9 @@ on: ...@@ -10,11 +9,9 @@ on:
options: options:
- pr - pr
- branch - branch
pr_number: pr_number:
description: 'Pull Request Number (only if test_target == "pr")' description: 'Pull Request Number (only if test_target == "pr")'
required: false required: false
branch_name: branch_name:
description: 'Branch name (only if test_target == "branch")' description: 'Branch name (only if test_target == "branch")'
default: 'main' default: 'main'
...@@ -39,11 +36,10 @@ on: ...@@ -39,11 +36,10 @@ on:
concurrency: concurrency:
group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} group: ${{ github.repository }}-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
check-comment: check-comment:
if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }} if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && !github.event.pull_request.draft) }}
runs-on: [ self-hosted, blackwell ] runs-on: [self-hosted, blackwell]
outputs: outputs:
should_run: ${{ steps.check.outputs.should_run }} should_run: ${{ steps.check.outputs.should_run }}
steps: steps:
...@@ -56,12 +52,10 @@ jobs: ...@@ -56,12 +52,10 @@ jobs:
else else
echo "should_run=false" >> $GITHUB_OUTPUT echo "should_run=false" >> $GITHUB_OUTPUT
fi fi
run-tests: run-tests:
runs-on: [ self-hosted, blackwell ] runs-on: [self-hosted, blackwell]
needs: [ check-comment ] needs: [check-comment]
if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }} if: ${{ github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true' }}
steps: steps:
- name: Determine ref - name: Determine ref
id: set-ref id: set-ref
...@@ -76,16 +70,13 @@ jobs: ...@@ -76,16 +70,13 @@ jobs:
with: with:
ref: ${{ steps.set-ref.outputs.ref }} ref: ${{ steps.set-ref.outputs.ref }}
submodules: true submodules: true
- name: Show current commit - name: Show current commit
run: git log -1 --oneline run: git log -1 --oneline
- name: Set up Python - name: Set up Python
run: | run: |
which python which python
echo "Setting up Python with Conda" echo "Setting up Python with Conda"
conda create -n test_env python=3.11 -y conda create -n test_env python=3.11 -y
- name: Install dependencies - name: Install dependencies
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -95,7 +86,6 @@ jobs: ...@@ -95,7 +86,6 @@ jobs:
echo "Installing dependencies" echo "Installing dependencies"
pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128 pip install torch==2.7 torchvision==0.22 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31 pip install ninja wheel diffusers==0.33.1 transformers==4.51 accelerate==1.7 sentencepiece==0.2 protobuf==6.31 huggingface_hub==0.31
- name: Build - name: Build
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -103,7 +93,6 @@ jobs: ...@@ -103,7 +93,6 @@ jobs:
which python which python
NUNCHAKU_INSTALL_MODE=ALL python setup.py develop NUNCHAKU_INSTALL_MODE=ALL python setup.py develop
pip install -r tests/requirements.txt pip install -r tests/requirements.txt
- name: Setup ComfyUI - name: Setup ComfyUI
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -127,7 +116,6 @@ jobs: ...@@ -127,7 +116,6 @@ jobs:
pip install -r nunchaku_tests/requirements.txt pip install -r nunchaku_tests/requirements.txt
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_models.py
HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py HF_TOKEN=${{ secrets.HF_TOKEN }} python custom_nodes/ComfyUI-nunchaku/scripts/download_test_data.py
- name: Run ComfyUI tests - name: Run ComfyUI tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
...@@ -136,7 +124,6 @@ jobs: ...@@ -136,7 +124,6 @@ jobs:
cd ../ComfyUI cd ../ComfyUI
python nunchaku_tests/scripts/nunchaku_flux1_dev.py python nunchaku_tests/scripts/nunchaku_flux1_dev.py
pytest -v nunchaku_tests/ pytest -v nunchaku_tests/
- name: Nunchaku FLUX memory tests - name: Nunchaku FLUX memory tests
run: | run: |
pwd pwd
...@@ -144,28 +131,24 @@ jobs: ...@@ -144,28 +131,24 @@ jobs:
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_memory.py
- name: Nunchaku FLUX example tests - name: Nunchaku FLUX example tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux/test_flux_examples.py
- name: Nunchaku FLUX other tests - name: Nunchaku FLUX other tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/flux --ignore=tests/flux/test_flux_memory.py --ignore=tests/flux/test_flux_examples.py
- name: Nunchaku SANA tests - name: Nunchaku SANA tests
run: | run: |
source $(conda info --base)/etc/profile.d/conda.sh source $(conda info --base)/etc/profile.d/conda.sh
conda activate test_env || { echo "Failed to activate conda env"; exit 1; } conda activate test_env || { echo "Failed to activate conda env"; exit 1; }
which python which python
NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana NUNCHAKU_TEST_CACHE_ROOT=${{ secrets.NUNCHAKU_TEST_CACHE_ROOT_BLACKWELL }} HF_TOKEN=${{ secrets.HF_TOKEN }} pytest -v tests/sana
- name: clean up - name: clean up
if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true') if: always() && (github.event_name != 'issue_comment' || needs.check-comment.outputs.should_run == 'true')
run: | run: |
......
# Adapted from https://github.com/sgl-project/sglang/blob/main/.pre-commit-config.yaml # Adapted from https://github.com/sgl-project/sglang/blob/main/.pre-commit-config.yaml
default_stages: [ pre-commit, pre-push, manual ] default_stages: [pre-commit, pre-push, manual]
repos: repos:
- repo: https://github.com/pre-commit/pre-commit-hooks - repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0 rev: v5.0.0
...@@ -10,15 +9,15 @@ repos: ...@@ -10,15 +9,15 @@ repos:
- id: trailing-whitespace - id: trailing-whitespace
- id: end-of-file-fixer - id: end-of-file-fixer
- id: check-yaml - id: check-yaml
args: [ --allow-multiple-documents ] args: [--allow-multiple-documents]
- id: check-toml - id: check-toml
- id: check-ast - id: check-ast
- id: check-added-large-files - id: check-added-large-files
- id: check-merge-conflict - id: check-merge-conflict
# - id: check-shebang-scripts-are-executable # - id: check-shebang-scripts-are-executable
- id: detect-private-key - id: detect-private-key
# - id: debug-statements # - id: debug-statements
# - id: no-commit-to-branch # - id: no-commit-to-branch
- repo: https://github.com/PyCQA/isort - repo: https://github.com/PyCQA/isort
rev: 5.13.2 rev: 5.13.2
hooks: hooks:
...@@ -27,7 +26,7 @@ repos: ...@@ -27,7 +26,7 @@ repos:
rev: v0.11.2 rev: v0.11.2
hooks: hooks:
- id: ruff - id: ruff
args: [ --fixable=F401 ] args: [--fixable=F401]
files: ^(nunchaku/|examples/|tests/|app/) files: ^(nunchaku/|examples/|tests/|app/)
exclude: \.ipynb$ exclude: \.ipynb$
- repo: https://github.com/psf/black - repo: https://github.com/psf/black
...@@ -35,14 +34,14 @@ repos: ...@@ -35,14 +34,14 @@ repos:
hooks: hooks:
- id: black-jupyter - id: black-jupyter
- id: black - id: black
args: [ -l, "120" ] args: [-l, "120"]
files: ^(nunchaku/|examples/|tests/|app/) files: ^(nunchaku/|examples/|tests/|app/)
- repo: https://github.com/pre-commit/mirrors-clang-format - repo: https://github.com/pre-commit/mirrors-clang-format
rev: v20.1.3 rev: v20.1.3
hooks: hooks:
- id: clang-format - id: clang-format
types_or: [ c++, cuda ] types_or: [c++, cuda]
args: [ --style=file, --verbose ] args: [--style=file, --verbose]
- repo: https://github.com/kynan/nbstripout - repo: https://github.com/kynan/nbstripout
rev: 0.8.1 rev: 0.8.1
hooks: hooks:
...@@ -50,3 +49,12 @@ repos: ...@@ -50,3 +49,12 @@ repos:
args: args:
- '--keep-output' - '--keep-output'
- '--extra-keys=metadata.kernelspec metadata.language_info.version' - '--extra-keys=metadata.kernelspec metadata.language_info.version'
- repo: https://github.com/google/yamlfmt
rev: v0.17.0
hooks:
- id: yamlfmt
- repo: https://github.com/executablebooks/mdformat
rev: 0.7.22
hooks:
- id: mdformat
name: (Markdown) Format docs with mdformat
This diff is collapsed.
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
<img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img> <img src="https://raw.githubusercontent.com/mit-han-lab/nunchaku/477953fa1dd6f082fbec201cea7c7430117a810e/assets/nunchaku.svg" alt="logo" width="220"></img>
</div> </div>
<h3 align="center"> <h3 align="center">
<a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a> <a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/nunchaku-6837e7498f680552f7bbb5ad"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/Nunchaku-519fed7f9de94e"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
</h3> </h3>
<h3 align="center"> <h3 align="center">
...@@ -18,19 +18,16 @@ ...@@ -18,19 +18,16 @@
- **[2025-04-09]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频,协助安装和使用Nunchaku。 - **[2025-04-09]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频,协助安装和使用Nunchaku。
- **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262),帮助社区快速上手并了解Nunchaku最新进展。 - **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262),帮助社区快速上手并了解Nunchaku最新进展。
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布!** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py),通过[**FP16 attention**](#fp16-attention)[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py),覆盖更多用户! - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布!** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py),通过[**FP16 attention**](#fp16-attention)[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py),覆盖更多用户!
- **[2025-03-17]** 🚀 发布NVFP4 4-bit量化版[Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)和FLUX.1工具集,升级INT4 FLUX.1工具模型。从[HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c)[ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641)下载更新!
- **[2025-03-13]** 📦 ComfyUI节点[独立仓库](https://github.com/mit-han-lab/ComfyUI-nunchaku)发布,安装更便捷!节点版本v0.1.6上线,全面支持[4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)
- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 发布!** 支持4-bit文本编码器和分层CPU offloading,FLUX最低显存需求降至**4 GiB**,同时保持**2–3倍加速**。修复分辨率、LoRA、内存锁定等稳定性问题,详情见更新日志! - **[2025-03-07]** 🚀 **Nunchaku v0.1.4 发布!** 支持4-bit文本编码器和分层CPU offloading,FLUX最低显存需求降至**4 GiB**,同时保持**2–3倍加速**。修复分辨率、LoRA、内存锁定等稳定性问题,详情见更新日志!
- **[2025-02-20]** 🚀 发布[预编译wheel包](https://huggingface.co/mit-han-lab/nunchaku),简化安装步骤!查看[安装指南](#安装指南)
- **[2025-02-20]** 🚀 **NVIDIA RTX 5090支持NVFP4精度!** 相比INT4,NVFP4画质更优,在RTX 5090上比BF16快**约3倍**[博客详解](https://hanlab.mit.edu/blog/svdquant-nvfp4)[示例代码](./examples)[在线演示](https://svdquant.mit.edu/flux1-schnell/)已上线! - **[2025-02-20]** 🚀 **NVIDIA RTX 5090支持NVFP4精度!** 相比INT4,NVFP4画质更优,在RTX 5090上比BF16快**约3倍**[博客详解](https://hanlab.mit.edu/blog/svdquant-nvfp4)[示例代码](./examples)[在线演示](https://svdquant.mit.edu/flux1-schnell/)已上线!
- **[2025-02-18]** 🔥 新增[自定义LoRA转换](#自定义lora)[模型量化](#自定义模型量化)指南![ComfyUI](./comfyui)工作流支持**自定义LoRA****FLUX.1工具集** - **[2025-02-18]** 🔥 新增[自定义LoRA转换](#%E8%87%AA%E5%AE%9A%E4%B9%89lora)[模型量化](#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B%E9%87%8F%E5%8C%96)指南![ComfyUI](./comfyui)工作流支持**自定义LoRA****FLUX.1工具集**
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007)入选ICLR 2025 Spotlight!FLUX.1工具集使用演示上线!** [使用演示](#使用演示)已更新![深度图生成演示](https://svdquant.mit.edu/flux1-depth-dev/)同步开放! - **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007)入选ICLR 2025 Spotlight!FLUX.1工具集使用演示上线!** [使用演示](#%E4%BD%BF%E7%94%A8%E6%BC%94%E7%A4%BA)已更新![深度图生成演示](https://svdquant.mit.edu/flux1-depth-dev/)同步开放!
<details> <details>
<summary>更多动态</summary> <summary>更多动态</summary>
- **[2025-02-04]** **🚀 4-bit量化版[FLUX.1工具集](https://blackforestlabs.ai/flux-1-tools/)发布!** 相比原模型提速**2-3倍**[示例代码](./examples)已更新,**ComfyUI支持即将到来!** - **[2025-02-04]** **🚀 4-bit量化版[FLUX.1工具集](https://blackforestlabs.ai/flux-1-tools/)发布!** 相比原模型提速**2-3倍**[示例代码](./examples)已更新,**ComfyUI支持即将到来!**
- **[2025-01-23]** 🚀 **支持4-bit量化[SANA](https://nvlabs.github.io/Sana/)!** 相比16位模型提速2-3倍。[使用示例](./examples/sana_1600m_pag.py)[部署指南](app/sana/t2i)已发布,体验[在线演示](https://svdquant.mit.edu) - **[2025-01-23]** 🚀 **支持4-bit量化[SANA](https://nvlabs.github.io/Sana/)!** 相比16位模型提速2-3倍。[使用示例](examples/sana1.6b_pag.py)[部署指南](app/sana/t2i)已发布,体验[在线演示](https://svdquant.mit.edu)
- **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007)**ICLR 2025** 接收! - **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007)**ICLR 2025** 接收!
- **[2024-12-08]** 支持 [ComfyUI](https://github.com/comfyanonymous/ComfyUI),详情见 [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) - **[2024-12-08]** 支持 [ComfyUI](https://github.com/comfyanonymous/ComfyUI),详情见 [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku)
- **[2024-11-07]** 🔥 最新 **W4A4** 扩散模型量化工作 [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) 开源!量化库 [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) 同步发布。 - **[2024-11-07]** 🔥 最新 **W4A4** 扩散模型量化工作 [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) 开源!量化库 [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) 同步发布。
...@@ -52,7 +49,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8 ...@@ -52,7 +49,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
#### 量化方法 -- SVDQuant #### 量化方法 -- SVDQuant
![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)SVDQuant三阶段示意图。阶段1:原始激活 $\boldsymbol{X}$ 和权重 $\boldsymbol{W}$ 均含异常值,4-bit量化困难。阶段2:将激活异常值迁移至权重,得到更易量化的激活 $\hat{\boldsymbol{X}}$ 和更难量化的权重 $\hat{\boldsymbol{W}}$ 。阶段3:通过SVD将 $\hat{\boldsymbol{W}}$ 分解为低秩分量 $\boldsymbol{L}_1\boldsymbol{L}_2$ 和残差 $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ ,低秩分支以16位精度运行缓解量化难度。 ![intuition](https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/intuition.gif)SVDQuant三阶段示意图。阶段1:原始激活 $\\boldsymbol{X}$ 和权重 $\\boldsymbol{W}$ 均含异常值,4-bit量化困难。阶段2:将激活异常值迁移至权重,得到更易量化的激活 $\\hat{\\boldsymbol{X}}$ 和更难量化的权重 $\\hat{\\boldsymbol{W}}$ 。阶段3:通过SVD将 $\\hat{\\boldsymbol{W}}$ 分解为低秩分量 $\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ 和残差 $\\hat{\\boldsymbol{W}}-\\boldsymbol{L}\_1\\boldsymbol{L}\_2$ ,低秩分支以16位精度运行缓解量化难度。
#### Nunchaku引擎设计 #### Nunchaku引擎设计
...@@ -69,6 +66,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8 ...@@ -69,6 +66,7 @@ https://github.com/user-attachments/assets/fdd4ab68-6489-4c65-8768-259bd866e8f8
### Wheel包安装 ### Wheel包安装
#### 前置条件 #### 前置条件
确保已安装 [PyTorch>=2.5](https://pytorch.org/)。例如: 确保已安装 [PyTorch>=2.5](https://pytorch.org/)。例如:
```shell ```shell
...@@ -76,6 +74,7 @@ pip install torch==2.6 torchvision==0.21 torchaudio==2.6 ...@@ -76,6 +74,7 @@ pip install torch==2.6 torchvision==0.21 torchaudio==2.6
``` ```
#### 安装nunchaku #### 安装nunchaku
[Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main)[ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku)[GitHub release](https://github.com/mit-han-lab/nunchaku/releases)选择对应Python和PyTorch版本的wheel。例如Python 3.11和PyTorch 2.6: [Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main)[ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku)[GitHub release](https://github.com/mit-han-lab/nunchaku/releases)选择对应Python和PyTorch版本的wheel。例如Python 3.11和PyTorch 2.6:
```shell ```shell
...@@ -110,9 +109,9 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0. ...@@ -110,9 +109,9 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
**注意** **注意**
* Linux需CUDA≥12.2,Windows需CUDA≥12.6。Blackwell显卡需CUDA≥12.8。 - Linux需CUDA≥12.2,Windows需CUDA≥12.6。Blackwell显卡需CUDA≥12.8。
* Windows用户请参考[此问题](https://github.com/mit-han-lab/nunchaku/issues/6)升级MSVC编译器。 - Windows用户请参考[此问题](https://github.com/mit-han-lab/nunchaku/issues/6)升级MSVC编译器。
* 支持SM_75(Turing:RTX 2080)、SM_86(Ampere:RTX 3090)、SM_89(Ada:RTX 4090)、SM_80(A100)架构显卡,详见[此问题](https://github.com/mit-han-lab/nunchaku/issues/1) - 支持SM_75(Turing:RTX 2080)、SM_86(Ampere:RTX 3090)、SM_89(Ada:RTX 4090)、SM_80(A100)架构显卡,详见[此问题](https://github.com/mit-han-lab/nunchaku/issues/1)
1. 安装依赖: 1. 安装依赖:
...@@ -132,32 +131,32 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0. ...@@ -132,32 +131,32 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
``` ```
2. 编译安装: 1. 编译安装:
确保`gcc/g++≥11`。Linux用户可通过Conda安装: 确保`gcc/g++≥11`。Linux用户可通过Conda安装:
```shell ```shell
conda install -c conda-forge gxx=11 gcc=11 conda install -c conda-forge gxx=11 gcc=11
``` ```
Windows用户请安装最新[Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false)。 Windows用户请安装最新[Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false)
编译命令: 编译命令:
```shell ```shell
git clone https://github.com/mit-han-lab/nunchaku.git git clone https://github.com/mit-han-lab/nunchaku.git
cd nunchaku cd nunchaku
git submodule init git submodule init
git submodule update git submodule update
python setup.py develop python setup.py develop
``` ```
打包wheel: 打包wheel:
```shell ```shell
NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
``` ```
设置`NUNCHAKU_INSTALL_MODE=ALL`确保wheel支持所有显卡架构。 设置`NUNCHAKU_INSTALL_MODE=ALL`确保wheel支持所有显卡架构。
## 使用示例 ## 使用示例
...@@ -179,7 +178,7 @@ image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, g ...@@ -179,7 +178,7 @@ image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, g
image.save(f"flux.1-dev-{precision}.png") image.save(f"flux.1-dev-{precision}.png")
``` ```
**注意****Turing显卡用户(如20系列)**需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块,完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py) **注意**\*\*Turing显卡用户(如20系列)\*\*需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块,完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py)
### FP16 Attention ### FP16 Attention
...@@ -281,14 +280,14 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL ...@@ -281,14 +280,14 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
## 使用演示 ## 使用演示
* FLUX.1 模型 - FLUX.1 模型
* 文生图:见 [`app/flux.1/t2i`](app/flux.1/t2i) - 文生图:见 [`app/flux.1/t2i`](app/flux.1/t2i)
* 草图生成图像 ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)):见 [`app/flux.1/sketch`](app/flux.1/sketch) - 草图生成图像 ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo)):见 [`app/flux.1/sketch`](app/flux.1/sketch)
* 深度/Canny 边缘生成图像 ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)):见 [`app/flux.1/depth_canny`](app/flux.1/depth_canny) - 深度/Canny 边缘生成图像 ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/)):见 [`app/flux.1/depth_canny`](app/flux.1/depth_canny)
* 修复 ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)):见 [`app/flux.1/fill`](app/flux.1/fill) - 修复 ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev)):见 [`app/flux.1/fill`](app/flux.1/fill)
* Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)):见 [`app/flux.1/redux`](app/flux.1/redux) - Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev)):见 [`app/flux.1/redux`](app/flux.1/redux)
* SANA: - SANA:
* 文生图:见 [`app/sana/t2i`](app/sana/t2i) - 文生图:见 [`app/sana/t2i`](app/sana/t2i)
## 自定义模型量化 ## 自定义模型量化
...@@ -303,6 +302,7 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL ...@@ -303,6 +302,7 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
请查看 [此处](https://github.com/mit-han-lab/nunchaku/issues/266) 获取四月的路线图。 请查看 [此处](https://github.com/mit-han-lab/nunchaku/issues/266) 获取四月的路线图。
## 贡献 ## 贡献
我们诚挚欢迎社区贡献!请参阅[贡献指南](docs/contribution_guide_ZH.md)了解如何为 Nunchaku 贡献代码。 我们诚挚欢迎社区贡献!请参阅[贡献指南](docs/contribution_guide_ZH.md)了解如何为 Nunchaku 贡献代码。
## 问题排查 ## 问题排查
...@@ -315,13 +315,13 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL ...@@ -315,13 +315,13 @@ Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FL
## 相关项目 ## 相关项目
* [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023 - [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
* [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023 - [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023
* [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023 - [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
* [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024 - [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
* [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024 - [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
* [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025 - [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
* [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025 - [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
## 引用 ## 引用
......
...@@ -6,8 +6,8 @@ This interactive Gradio application transforms your uploaded image into a differ ...@@ -6,8 +6,8 @@ This interactive Gradio application transforms your uploaded image into a differ
The base models are: The base models are:
* [FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) (preserves depth map) - [FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) (preserves depth map)
* [FLUX.1-Canny-dev](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) (preserves Canny edge) - [FLUX.1-Canny-dev](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) (preserves Canny edge)
First you need to install some dependencies: First you need to install some dependencies:
...@@ -22,7 +22,7 @@ Then run: ...@@ -22,7 +22,7 @@ Then run:
python run_gradio.py python run_gradio.py
``` ```
* By default, the model is `FLUX.1-Depth-dev`. You can add `-m canny` to switch to `FLUX.1-Canny-dev`. - By default, the model is `FLUX.1-Depth-dev`. You can add `-m canny` to switch to `FLUX.1-Canny-dev`.
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`. - The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`. - To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model. - By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
...@@ -8,6 +8,6 @@ This interactive Gradio application allows you to interactively inpaint an uploa ...@@ -8,6 +8,6 @@ This interactive Gradio application allows you to interactively inpaint an uploa
python run_gradio.py python run_gradio.py
``` ```
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`. - The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`. - To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model. - By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
...@@ -8,4 +8,4 @@ This interactive Gradio application allows you to interactively generate image v ...@@ -8,4 +8,4 @@ This interactive Gradio application allows you to interactively generate image v
python run_gradio.py python run_gradio.py
``` ```
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model. - By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
...@@ -10,6 +10,6 @@ To launch the application, simply run: ...@@ -10,6 +10,6 @@ To launch the application, simply run:
python run_gradio.py python run_gradio.py
``` ```
* The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`. - The demo loads the Gemma-2B model as a safety checker by default. To disable this feature, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`. - To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model. - By default, we use our INT4 model. Use `-p bf16` to switch to the BF16 model.
...@@ -12,10 +12,10 @@ To launch the application, simply run: ...@@ -12,10 +12,10 @@ To launch the application, simply run:
python run_gradio.py python run_gradio.py
``` ```
* The demo also defaults to the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`. - The demo also defaults to the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.
* By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use `--no-safety-checker`. - By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use `--no-safety-checker`.
* To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`. - To further reduce GPU memory usage, you can enable the W4A16 text encoder by specifying `--use-qencoder`.
* By default, only the INT4 DiT is loaded. Use `-p int4 bf16` to add a BF16 DiT for side-by-side comparison, or `-p bf16` to load only the BF16 model. - By default, only the INT4 DiT is loaded. Use `-p int4 bf16` to add a BF16 DiT for side-by-side comparison, or `-p bf16` to load only the BF16 model.
## Command Line Inference ## Command Line Inference
...@@ -25,13 +25,17 @@ We provide a script, [generate.py](generate.py), that generates an image from a ...@@ -25,13 +25,17 @@ We provide a script, [generate.py](generate.py), that generates an image from a
python generate.py --prompt "You Text Prompt" python generate.py --prompt "You Text Prompt"
``` ```
* The generated image will be saved as `output.png` by default. You can specify a different path using the `-o` or `--output-path` options. - The generated image will be saved as `output.png` by default. You can specify a different path using the `-o` or `--output-path` options.
* The script defaults to using the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.
* By default, the script uses our INT4 model. To use the BF16 model instead, specify `-p bf16`.
* You can specify `--use-qencoder` to use our W4A16 text encoder.
* You can adjust the number of inference steps and guidance scale with `-t` and `-g`, respectively. For the FLUX.1-schnell model, the defaults are 4 steps and a guidance scale of 0; for the FLUX.1-dev model, the defaults are 50 steps and a guidance scale of 3.5.
* When using the FLUX.1-dev model, you also have the option to load a LoRA adapter with `--lora-name`. Available choices are `None`, [`Anime`](https://huggingface.co/alvdansen/sonny-anime-fixed), [`GHIBSKY Illustration`](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration), [`Realism`](https://huggingface.co/XLabs-AI/flux-RealismLora), [`Children Sketch`](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch), and [`Yarn Art`](https://huggingface.co/linoyts/yarn_art_Flux_LoRA), with the default set to `None`. You can also specify the LoRA weight with `--lora-weight`, which defaults to 1. - The script defaults to using the FLUX.1-schnell model. To switch to the FLUX.1-dev model, use `-m dev`.
- By default, the script uses our INT4 model. To use the BF16 model instead, specify `-p bf16`.
- You can specify `--use-qencoder` to use our W4A16 text encoder.
- You can adjust the number of inference steps and guidance scale with `-t` and `-g`, respectively. For the FLUX.1-schnell model, the defaults are 4 steps and a guidance scale of 0; for the FLUX.1-dev model, the defaults are 50 steps and a guidance scale of 3.5.
- When using the FLUX.1-dev model, you also have the option to load a LoRA adapter with `--lora-name`. Available choices are `None`, [`Anime`](https://huggingface.co/alvdansen/sonny-anime-fixed), [`GHIBSKY Illustration`](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration), [`Realism`](https://huggingface.co/XLabs-AI/flux-RealismLora), [`Children Sketch`](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch), and [`Yarn Art`](https://huggingface.co/linoyts/yarn_art_Flux_LoRA), with the default set to `None`. You can also specify the LoRA weight with `--lora-weight`, which defaults to 1.
## Latency Benchmark ## Latency Benchmark
...@@ -41,12 +45,12 @@ To measure the latency of our INT4 models, use the following command: ...@@ -41,12 +45,12 @@ To measure the latency of our INT4 models, use the following command:
python latency.py python latency.py
``` ```
* The script defaults to the INT4 FLUX.1-schnell model. To switch to FLUX.1-dev, use the `-m dev` option. For BF16 precision, add `-p bf16`. - The script defaults to the INT4 FLUX.1-schnell model. To switch to FLUX.1-dev, use the `-m dev` option. For BF16 precision, add `-p bf16`.
* Adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively. - Adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively.
- For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0. - For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
- For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5. - For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
* By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the `--mode step` flag. - By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the `--mode step` flag.
* Specify the number of warmup and test runs using `--warmup-times` and `--test-times`. The defaults are 2 warmup runs and 10 test runs. - Specify the number of warmup and test runs using `--warmup-times` and `--test-times`. The defaults are 2 warmup runs and 10 test runs.
## Quality Results ## Quality Results
...@@ -63,12 +67,12 @@ python evaluate.py -p int4 ...@@ -63,12 +67,12 @@ python evaluate.py -p int4
python evaluate.py -p bf16 python evaluate.py -p bf16
``` ```
* The commands above will generate images from FLUX.1-schnell on both datasets. Use `-m dev` to switch to FLUX.1-dev, or specify a single dataset with `-d MJHQ` or `-d DCI`. - The commands above will generate images from FLUX.1-schnell on both datasets. Use `-m dev` to switch to FLUX.1-dev, or specify a single dataset with `-d MJHQ` or `-d DCI`.
* By default, generated images are saved to `results/$MODEL/$PRECISION`. Customize the output path using the `-o` option if desired. - By default, generated images are saved to `results/$MODEL/$PRECISION`. Customize the output path using the `-o` option if desired.
* You can also adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively. - You can also adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively.
- For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0. - For FLUX.1-schnell, the defaults are 4 steps and a guidance scale of 0.
- For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5. - For FLUX.1-dev, the defaults are 50 steps and a guidance scale of 3.5.
* To accelerate the generation process, you can distribute the workload across multiple GPUs. For instance, if you have $N$ GPUs, on GPU $i (0 \le i < N)$ , you can add the options `--chunk-start $i --chunk-step $N`. This setup ensures each GPU handles a distinct portion of the workload, enhancing overall efficiency. - To accelerate the generation process, you can distribute the workload across multiple GPUs. For instance, if you have $N$ GPUs, on GPU $i (0 \\le i < N)$ , you can add the options `--chunk-start $i --chunk-step $N`. This setup ensures each GPU handles a distinct portion of the workload, enhancing overall efficiency.
Finally you can compute the metrics for the images with Finally you can compute the metrics for the images with
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment