1. 05 Sep, 2025 1 commit
    • Muyang Li's avatar
      docs: add the docstrings for v1.0.0 (#656) · 070c45bb
      Muyang Li authored
      * add v2 flux examples
      
      * add the docs
      
      * add docs
      
      * update
      
      * finished ops
      
      * add ops
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update docstrings
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * finished the api docs
      
      * update
      
      * update
      070c45bb
  2. 03 Sep, 2025 1 commit
    • Muyang Li's avatar
      feat: async CPU offloading for Python backend (#624) · eb901251
      Muyang Li authored
      * tmp
      
      * update
      
      * update
      
      * finished the offloading impl
      
      * the offloading is buggy
      
      * update utils
      
      * the offloading is still buggy
      
      * update
      
      * correctness and speedup done; need to check the vram overhead
      
      * done
      
      * final debugging
      
      * update
      
      * update
      
      * correct now
      
      * fix
      
      * update
      
      * use per-layer offloading
      
      * fix the offloading on 5090
      
      * support setting the num_blocks_on_gpu
      
      * change the import name
      eb901251
  3. 27 Aug, 2025 2 commits
  4. 15 Aug, 2025 3 commits
    • Muyang Li's avatar
      chore: fix a typo · 17c7154a
      Muyang Li authored
      17c7154a
    • Muyang Li's avatar
      chore: update the qwen-image example · d797a26d
      Muyang Li authored
      d797a26d
    • Muyang Li's avatar
      feat: pythonized model and QwenImage Support (#593) · f86ad470
      Muyang Li authored
      * start refract the codebase
      
      * update
      
      * update
      
      * start to implement ops
      
      * add gemm
      
      * write the docstrings
      
      * define the w4a4 svdq linear
      
      * update
      
      * make the linter happy
      
      * finished the SVDQW4A4Linear
      
      * finished the SVDQW4A4Linear
      
      * update
      
      * update
      
      * add a patcher to the model
      
      * update
      
      * add adanormsinglezero
      
      * update
      
      * update
      
      * finished the naive implementation of nunchaku flux
      
      * add ff
      
      * finished the naive forward
      
      * update
      
      * svdq linear
      
      * start debugging
      
      * fix some issues
      
      * successfully built the model
      
      * update
      
      * successfully load the model
      
      * update
      
      * update
      
      * update
      
      * try to making it runnable
      
      * debugging
      
      * debugging
      
      * debugging
      
      * add bias to awq linear
      
      * run through
      
      * fix the normalization
      
      * update
      
      * update
      
      * update
      
      * fix the attention
      
      * fix the no fuse nvfp models
      
      * update
      
      * finished the fused ff
      
      * make linter happy
      
      * make linter happy
      
      * make linter happy
      
      * debugging the fp16 attn
      
      * nunchaku fp16 is bug...
      f86ad470