update demo

9efa8822 · Muyang Li · bf43f0df · 9efa8822 · 9efa8822 · bf43f0df
Commit 9efa8822 authored Nov 07, 2024 by Muyang Li
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 8 deletions

README.md README.md +3 -8

assets/demo.gif assets/demo.gif +0 -0

assets/speed_demo.gif assets/speed_demo.gif +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -11,7 +11,9 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat
 Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, and Song Han <br>
 *MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU, and Pika Labs* <br>

-## Overview
+![teaser](./assets/demo.gif)
+
+## Method

 #### Quantization Method -- SVDQuant

@@ -26,13 +28,6 @@ Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie,

 ![efficiency](./assets/efficiency.jpg)SVDQuant reduces the model size of the 12B FLUX.1 by 3.6×. Additionally, Nunchaku, further cuts memory usage of the 16-bit model by 3.5× and delivers 3.0× speedups over the NF4 W4A16 baseline on both the desktop and laptop NVIDIA RTX 4090 GPUs. Remarkably, on laptop 4090, it achieves in total 10.1× speedup by eliminating CPU offloading.

-<p align="center">
-  <img src="./assets/speed_demo.gif" width="80%"/>
-</p>
-
-
-
-
 ## Installation
 1. Install dependencies:
 	```shell

--- a/assets/demo.gif
+++ b/assets/demo.gif
--- a/assets/speed_demo.gif
+++ b/assets/speed_demo.gif