[Minor] Update README.md

043f17bf · Haotian (Ken) Tang · GitHub · 34bbd4c1 · 043f17bf
Unverified Commit 043f17bf authored Jul 31, 2023 by Haotian (Ken) Tang Committed by GitHub Jul 31, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

tinychat/README.md tinychat/README.md +1 -1

No files found.
--- a/tinychat/README.md
+++ b/tinychat/README.md
@@ -44,7 +44,7 @@ Thanks to AWQ, TinyChat can now deliver more prompt responses through 4-bit infe

 We benchmark TinyChat on A6000 (server-class GPU), 4090 (desktop GPU) and Orin (edge GPU).

-We use the default implementation from Huggingface for the FP16 baseline. The INT4 implementation applies AWQ and utilizes our fast W4A16 GPU kernel. Please notice that the end-to-end runtime for INT4 TinyChat could be further improved if we reduce the framework overhead from Huggingface (e.g. utilizing the implementation from TGI). 
+We use the default implementation from Huggingface for the FP16 baseline. The INT4 implementation applies AWQ and utilizes our fast W4A16 GPU kernel. Please notice that the end-to-end runtime for INT4 TinyChat could be further improved if we reduce the framework overhead from Huggingface (e.g. utilizing the implementation from TGI). We are working on a new release with even faster inference performance, please stay tuned!

 The latency reported in all tables are per-token latency for the generation stage.