Commit d57a18e1 authored by muyangli's avatar muyangli
Browse files

[minor] update README

parent c17a2f6e
...@@ -4,7 +4,7 @@ Nunchaku is an inference engine designed for 4-bit diffusion models, as demonstr ...@@ -4,7 +4,7 @@ Nunchaku is an inference engine designed for 4-bit diffusion models, as demonstr
### [Paper](http://arxiv.org/abs/2411.05007) | [Project](https://hanlab.mit.edu/projects/svdquant) | [Blog](https://hanlab.mit.edu/blog/svdquant) | [Demo](https://svdquant.mit.edu) ### [Paper](http://arxiv.org/abs/2411.05007) | [Project](https://hanlab.mit.edu/projects/svdquant) | [Blog](https://hanlab.mit.edu/blog/svdquant) | [Demo](https://svdquant.mit.edu)
- **[2025-02-11]** 🔥 **FLUX.1-tools Gradio demos are now available!** Check [here] for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux.1-depth-dev/) is also online—try it out! - **[2025-02-11]** 🔥 **FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux.1-depth-dev/) is also online—try it out!
- **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!** - **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!**
- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)! - **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
- **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**! - **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**!
......
...@@ -128,7 +128,7 @@ Tensor Attention::forward(Tensor qkv, Tensor pool_qkv, float sparsityRatio) { ...@@ -128,7 +128,7 @@ Tensor Attention::forward(Tensor qkv, Tensor pool_qkv, float sparsityRatio) {
assert(qkv.shape[2] == num_heads * dim_head * 3); assert(qkv.shape[2] == num_heads * dim_head * 3);
constexpr int POOL_SIZE = 128; constexpr int POOL_SIZE = 128;
const int pool_tokens = num_tokens / POOL_SIZE; const int pool_tokens = ceilDiv(num_tokens, POOL_SIZE);
Tensor blockmask; Tensor blockmask;
......
...@@ -1209,7 +1209,7 @@ public: ...@@ -1209,7 +1209,7 @@ public:
const bool is_q = bn < binfo.numBlocksN / 3; const bool is_q = bn < binfo.numBlocksN / 3;
const bool is_k = !is_q && bn < binfo.numBlocksN / 3 * 2; const bool is_k = !is_q && bn < binfo.numBlocksN / 3 * 2;
assert(args.actualM == M); assert(!args.pool_out || args.actualM == M);
assert(args.actualN == N); assert(args.actualN == N);
if (is_q || is_k) { if (is_q || is_k) {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment