Commits · fd51fbd08c94e2161c6263080932a4898f22628a · fengzch-das / nunchaku

05 Sep, 2025 1 commit

docs: add the docstrings for v1.0.0 (#656) · 070c45bb

Muyang Li authored Sep 04, 2025

* add v2 flux examples

* add the docs

* add docs

* update

* finished ops

* add ops

* update

* update

* update

* update

* update

* update

* update

* update docstrings

* update

* update

* update

* update

* update

* update

* update

* finished the api docs

* update

* update

070c45bb

03 Sep, 2025 1 commit

feat: async CPU offloading for Python backend (#624) · eb901251

Muyang Li authored Sep 03, 2025

* tmp

* update

* update

* finished the offloading impl

* the offloading is buggy

* update utils

* the offloading is still buggy

* update

* correctness and speedup done; need to check the vram overhead

* done

* final debugging

* update

* update

* correct now

* fix

* update

* use per-layer offloading

* fix the offloading on 5090

* support setting the num_blocks_on_gpu

* change the import name

eb901251

27 Aug, 2025 2 commits

feat: Implement V2 FBCaching and Optimize Existing FBCache (#621) · 882aa077

SMG authored Aug 28, 2025

* caching_v2

* rename fb cache and write docstring

* lint

* rename utils to fbcache

* no need maintain sana for caching

882aa077

feat: support lightning Qwen-Image models (#641) · 7b0dbce5

Muyang Li authored Aug 27, 2025

* update

* update

* update README

* update dos

* update docs

* improve the lightning script

* update the example script

* change the repo name

7b0dbce5

15 Aug, 2025 3 commits

chore: fix a typo · 17c7154a
Muyang Li authored Aug 15, 2025

17c7154a
chore: update the qwen-image example · d797a26d
Muyang Li authored Aug 15, 2025

d797a26d

feat: pythonized model and QwenImage Support (#593) · f86ad470

Muyang Li authored Aug 15, 2025

* start refract the codebase

* update

* update

* start to implement ops

* add gemm

* write the docstrings

* define the w4a4 svdq linear

* update

* make the linter happy

* finished the SVDQW4A4Linear

* finished the SVDQW4A4Linear

* update

* update

* add a patcher to the model

* update

* add adanormsinglezero

* update

* update

* finished the naive implementation of nunchaku flux

* add ff

* finished the naive forward

* update

* svdq linear

* start debugging

* fix some issues

* successfully built the model

* update

* successfully load the model

* update

* update

* update

* try to making it runnable

* debugging

* debugging

* debugging

* add bias to awq linear

* run through

* fix the normalization

* update

* update

* update

* fix the attention

* fix the no fuse nvfp models

* update

* finished the fused ff

* make linter happy

* make linter happy

* make linter happy

* debugging the fp16 attn

* nunchaku fp16 is bug...

f86ad470