"driver/driver.hip.cpp" did not exist on "07f16673c98ab0952fe09a58e362714f58998ebb"
Commit 95273623 authored by Lei Wang's avatar Lei Wang Committed by GitHub
Browse files

[Docs] Update Figures from Arxiv Version (#11)

parent 57ab687c
...@@ -9,17 +9,7 @@ ...@@ -9,17 +9,7 @@
Tile Language (**tile-lang**) is a concise domain-specific language designed to streamline the development of high-performance GPU/CPU kernels (e.g., GEMM, Dequant GEMM, FlashAttention, LinearAttention). By employing a Pythonic syntax with an underlying compiler infrastructure on top of [TVM](https://tvm.apache.org/), tile-lang allows developers to focus on productivity without sacrificing the low-level optimizations necessary for state-of-the-art performance. Tile Language (**tile-lang**) is a concise domain-specific language designed to streamline the development of high-performance GPU/CPU kernels (e.g., GEMM, Dequant GEMM, FlashAttention, LinearAttention). By employing a Pythonic syntax with an underlying compiler infrastructure on top of [TVM](https://tvm.apache.org/), tile-lang allows developers to focus on productivity without sacrificing the low-level optimizations necessary for state-of-the-art performance.
## Tested Devices ## Tested Devices
Although tile-lang aims to be portable across a range of Devices, it has been specifically tested and validated on the following devices: Although tile-lang aims to be portable across a range of Devices, it has been specifically tested and validated on the following devices: for NVIDIA GPUs, this includes the H100 (with Auto TMA/WGMMA support), A100, V100, RTX 4090, RTX 3090, and RTX A600; for AMD GPUs, it includes the MI250 (with Auto MatrixCore support) and the MI300X (with Async Copy support).
- **NVIDIA GPUS**:
- H100 (**with Auto TMA/WGMMA Support**),
- A100
- V100
- RTX 4090
- RTX 3090
- RTX A600
- **AMD GPUS**:
- MI250 (**with Auto MatrixCore Support**)
- MI300 (**with Async Copy Support**)
## OP Implementation Examples ## OP Implementation Examples
**tile-lang** provides the building blocks to implement a wide variety of operators. Some examples include: **tile-lang** provides the building blocks to implement a wide variety of operators. Some examples include:
...@@ -35,16 +25,16 @@ Within the `examples` repository, you will also find additional complex kernels ...@@ -35,16 +25,16 @@ Within the `examples` repository, you will also find additional complex kernels
TileLang achieves exceptional performance across a variety of computational patterns. Below are selected results showcasing its capabilities: TileLang achieves exceptional performance across a variety of computational patterns. Below are selected results showcasing its capabilities:
- Operator Performance Vs. Baselines on H100 - Flash Attention Performance on H100
<div> <div>
<img src="./images/op_benchmark_h100.png" alt="operator performance on H100" /> <img src="./images/mha_performance_h100.png" alt="operator performance on H100" />
</div> </div>
- MatrixCore FP16 GEMM Performance Vs. Baselines on MI300X - Matmul Performance on GPUs (RTX 4090, A100, H100, MI300X)
<div> <div>
<img src="./images/op_benchmark_mi300_fp16_gemm_normalized_latency.png" alt="gemm fp16 performance on MI300X" /> <img src="./images/op_benchmark_consistent_gemm_fp16.png" alt="gemm fp16 performance on Gpus" />
</div> </div>
## Installation ## Installation
......
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment