[minor] update README

d57a18e1 · muyangli · c17a2f6e · d57a18e1 · d57a18e1 · d57a18e1
Commit d57a18e1 authored Feb 11, 2025 by muyangli
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 3 deletions

README.md README.md +1 -1

src/FluxModel.cpp src/FluxModel.cpp +1 -1

src/kernels/zgemm/gemm_w4a4.cuh src/kernels/zgemm/gemm_w4a4.cuh +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@ Nunchaku is an inference engine designed for 4-bit diffusion models, as demonstr
 ### [Paper](http://arxiv.org/abs/2411.05007) | [Project](https://hanlab.mit.edu/projects/svdquant) | [Blog](https://hanlab.mit.edu/blog/svdquant) | [Demo](https://svdquant.mit.edu)
- **[2025-02-11]** 🔥 **FLUX.1-tools Gradio demos are now available!** Check [here] for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux.1-depth-dev/) is also online—try it out!
+- **[2025-02-11]** 🔥 **FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux.1-depth-dev/) is also online—try it out!
 - **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!**
 - **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
 - **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**!

--- a/src/FluxModel.cpp
+++ b/src/FluxModel.cpp
@@ -128,7 +128,7 @@ Tensor Attention::forward(Tensor qkv, Tensor pool_qkv, float sparsityRatio) {
    assert(qkv.shape[2] == num_heads * dim_head * 3);
    constexpr int POOL_SIZE = 128;
-    const int pool_tokens = num_tokens / POOL_SIZE;
+    const int pool_tokens = ceilDiv(num_tokens, POOL_SIZE);
    Tensor blockmask;

--- a/src/kernels/zgemm/gemm_w4a4.cuh
+++ b/src/kernels/zgemm/gemm_w4a4.cuh
@@ -1209,7 +1209,7 @@ public:
            const bool is_q = bn < binfo.numBlocksN / 3;
            const bool is_k = !is_q && bn < binfo.numBlocksN / 3 * 2;
-            assert(args.actualM == M);
+            assert(!args.pool_out || args.actualM == M);
            assert(args.actualN == N);
            if (is_q || is_k) {