1. 05 Sep, 2025 1 commit
    • Muyang Li's avatar
      docs: add the docstrings for v1.0.0 (#656) · 070c45bb
      Muyang Li authored
      * add v2 flux examples
      
      * add the docs
      
      * add docs
      
      * update
      
      * finished ops
      
      * add ops
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update docstrings
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * finished the api docs
      
      * update
      
      * update
      070c45bb
  2. 03 Sep, 2025 1 commit
    • Muyang Li's avatar
      feat: async CPU offloading for Python backend (#624) · eb901251
      Muyang Li authored
      * tmp
      
      * update
      
      * update
      
      * finished the offloading impl
      
      * the offloading is buggy
      
      * update utils
      
      * the offloading is still buggy
      
      * update
      
      * correctness and speedup done; need to check the vram overhead
      
      * done
      
      * final debugging
      
      * update
      
      * update
      
      * correct now
      
      * fix
      
      * update
      
      * use per-layer offloading
      
      * fix the offloading on 5090
      
      * support setting the num_blocks_on_gpu
      
      * change the import name
      eb901251
  3. 27 Aug, 2025 2 commits
  4. 15 Aug, 2025 3 commits
    • Muyang Li's avatar
      chore: fix a typo · 17c7154a
      Muyang Li authored
      17c7154a
    • Muyang Li's avatar
      chore: update the qwen-image example · d797a26d
      Muyang Li authored
      d797a26d
    • Muyang Li's avatar
      feat: pythonized model and QwenImage Support (#593) · f86ad470
      Muyang Li authored
      * start refract the codebase
      
      * update
      
      * update
      
      * start to implement ops
      
      * add gemm
      
      * write the docstrings
      
      * define the w4a4 svdq linear
      
      * update
      
      * make the linter happy
      
      * finished the SVDQW4A4Linear
      
      * finished the SVDQW4A4Linear
      
      * update
      
      * update
      
      * add a patcher to the model
      
      * update
      
      * add adanormsinglezero
      
      * update
      
      * update
      
      * finished the naive implementation of nunchaku flux
      
      * add ff
      
      * finished the naive forward
      
      * update
      
      * svdq linear
      
      * start debugging
      
      * fix some issues
      
      * successfully built the model
      
      * update
      
      * successfully load the model
      
      * update
      
      * update
      
      * update
      
      * try to making it runnable
      
      * debugging
      
      * debugging
      
      * debugging
      
      * add bias to awq linear
      
      * run through
      
      * fix the normalization
      
      * update
      
      * update
      
      * update
      
      * fix the attention
      
      * fix the no fuse nvfp models
      
      * update
      
      * finished the fused ff
      
      * make linter happy
      
      * make linter happy
      
      * make linter happy
      
      * debugging the fp16 attn
      
      * nunchaku fp16 is buggy
      
      * finish the fp16 attn
      
      * fp4 done
      
      * fix the lora scales
      
      * add a default value for alpha; need to debug int4
      
      * fix input4
      
      * update
      
      * update
      
      * ff does not work
      
      * specialize the processors
      
      * qwen transformer done. start debugging
      
      * make linter happy
      
      * add schnell v2 for metrics eval
      
      * chore: schnellv2 eval
      
      * update
      
      * ff and attention correct
      
      * need to check what happened to module
      
      * fp4 done
      
      * make linter happy
      
      * update an example script
      
      * reformat
      
      * add an example script
      
      * add the annoucement
      
      * remove a misleading info
      
      * ready to release
      f86ad470