Commit 4809f743 authored by Haotian Tang's avatar Haotian Tang
Browse files

[Minor] Update README.md.

parent 3405f260
...@@ -14,11 +14,11 @@ The current release supports: ...@@ -14,11 +14,11 @@ The current release supports:
![TinyChat on RTX 4090: W4A16 is 2.3x faster than FP16](./tinychat/figures/4090_example.gif) ![TinyChat on RTX 4090: W4A16 is 2.3x faster than FP16](./tinychat/figures/4090_example.gif)
Check out [TinyChat](tinychat), which delievers 2.3x faster inference performance for the **LLaMA-2** chatbot! Check out [TinyChat](tinychat), which delievers 2.3x faster inference performance for the **LLaMA-2** chatbot on RTX 4090!
## News ## News
- [2023/07] 🔥 We released **TinyChat**, an efficient and minimal chatbot interface based on AWQ. TinyChat also enables edge LLM inference on Jetson Orin. LLama-2-chat models are supported! Check out our implementation [here](tinychat). - [2023/07] 🔥 We released **TinyChat**, an efficient and minimal chatbot interface based on AWQ. TinyChat enables efficient LLM inference on both cloud and edge GPUs. LLama-2-chat models are supported! Check out our implementation [here](tinychat).
- [2023/07] 🔥 We added AWQ support and pre-computed search results for Llama-2 models (7B & 13B). Checkout our model zoo [here](https://huggingface.co/datasets/mit-han-lab/awq-model-zoo)! - [2023/07] 🔥 We added AWQ support and pre-computed search results for Llama-2 models (7B & 13B). Checkout our model zoo [here](https://huggingface.co/datasets/mit-han-lab/awq-model-zoo)!
- [2023/07] We extended the support for more LLM models including MPT, Falcon, and BLOOM. - [2023/07] We extended the support for more LLM models including MPT, Falcon, and BLOOM.
......
# TinyChat: Efficient and Minimal Chatbot with AWQ # TinyChat: Efficient and Minimal Chatbot with AWQ
We introduce TinyChat, a cutting-edge chatbot interface designed for minimal resource consumption and fast inference speed on GPU platforms. It allows for seamless deployment on low-power edge devices like the NVIDIA Jetson Orin, empowering users with a responsive conversational experience like never before. We introduce TinyChat, a cutting-edge chatbot interface designed for minimal resource consumption and fast inference speed on GPU platforms. It allows for seamless deployment on consumer-level GPUs such as 3090/4090 and low-power edge devices like the NVIDIA Jetson Orin, empowering users with a responsive conversational experience like never before.
......
../../llm-awq-chat-dev/tinychat/awq_cache
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment