"host/online_compile/include/kernel_cache.hpp" did not exist on "d2315b0dfcd6f31cca4328819eaf60d77e952dd6"
Commit 3a14e12d authored by Lei Wang's avatar Lei Wang Committed by GitHub
Browse files

[Doc] Update the example figures in README (#3)

* update readme

* center and resize benchmark figures
parent ece99756
...@@ -2,12 +2,12 @@ ...@@ -2,12 +2,12 @@
# Tile Language # Tile Language
<img src=./images/logo-row.svg />
</div> </div>
Tile Language (**tile-lang**) is a concise domain-specific language designed to streamline the development of high-performance GPU/CPU kernels (e.g., GEMM, Dequant GEMM, FlashAttention, LinearAttention). By employing a Pythonic syntax with an underlying compiler infrastructure on top of [TVM](https://tvm.apache.org/), tile-lang allows developers to focus on productivity without sacrificing the low-level optimizations necessary for state-of-the-art performance. Tile Language (**tile-lang**) is a concise domain-specific language designed to streamline the development of high-performance GPU/CPU kernels (e.g., GEMM, Dequant GEMM, FlashAttention, LinearAttention). By employing a Pythonic syntax with an underlying compiler infrastructure on top of [TVM](https://tvm.apache.org/), tile-lang allows developers to focus on productivity without sacrificing the low-level optimizations necessary for state-of-the-art performance.
<img src=./images/MatmulExample.svg />
## Tested Devices ## Tested Devices
Although tile-lang aims to be portable across a range of Devices, it has been specifically tested and validated on the following devices: for NVIDIA GPUs, this includes the H100 (with Auto TMA/WGMMA support), A100, V100, RTX 4090, RTX 3090, and RTX A600; for AMD GPUs, it includes the MI250 (with Auto MatrixCore support) and the MI300X (with Async Copy support). Although tile-lang aims to be portable across a range of Devices, it has been specifically tested and validated on the following devices: for NVIDIA GPUs, this includes the H100 (with Auto TMA/WGMMA support), A100, V100, RTX 4090, RTX 3090, and RTX A600; for AMD GPUs, it includes the MI250 (with Auto MatrixCore support) and the MI300X (with Async Copy support).
...@@ -27,8 +27,7 @@ TileLang achieves exceptional performance across a variety of computational patt ...@@ -27,8 +27,7 @@ TileLang achieves exceptional performance across a variety of computational patt
- Flash Attention Performance on H100 - Flash Attention Performance on H100
<div> <div align="center"> <img src="./images/mha_performance_h100.png" alt="operator performance on H100" width=80% />
<img src="./images/mha_performance_h100.png" alt="operator performance on H100" />
</div> </div>
- Matmul Performance on GPUs (RTX 4090, A100, H100, MI300X) - Matmul Performance on GPUs (RTX 4090, A100, H100, MI300X)
......
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment