Commit 43f55f7d authored by Cunxiao Ni's avatar Cunxiao Ni Committed by GitHub
Browse files

[Doc] update installation.md and readme (#22)

* [Doc] update installation.md and readme

* solve conflicts

* change readme

* fix installation.rst

* fix readme

* fix installation
parent 8e7feca9
......@@ -71,9 +71,9 @@ pip install . # with -e option if you want to install in editable mode
### Method 2: Build from Source
We currently provide three ways to install **tile-lang** from source:
- [Install from Source (using your own TVM installation)](./docs/Installation.md#install-from-source-with-your-own-tvm-installation)
- [Install from Source (using the bundled TVM submodule)](./docs/Installation.md#install-from-source-with-our-tvm-submodule)
- [Install Using the Provided Script](./docs/Installation.md#install-with-provided-script)
- [Install from Source (using your own TVM installation)](./docs/get_started/Installation.rst#method-1-install-from-source-using-your-own-tvm-installation)
- [Install from Source (using the bundled TVM submodule)](./docs/get_started/Installation.rst#method-2-install-from-source-with-our-tvm-submodule)
- [Install Using the Provided Script](./docs/get_started/Installation.rst##method-3-install-using-the-provided-script)
## Quick Start
......@@ -192,6 +192,12 @@ In addition to GEMM, we provide a variety of examples to showcase the versatilit
TileLang has now been used in project [BitBLAS](https://github.com/microsoft/BitBLAS).
## Join the Discussion
Welcome to join our Discord community for discussions, support, and collaboration!
[![Join our Discord](https://img.shields.io/badge/Discord-Join%20Us-blue?logo=discord&style=for-the-badge)](https://discord.gg/TUrHyJnKPG)
## Acknowledgements
We learned a lot from the [TVM](https://github.com/apache/tvm) community and would like to thank them for their contributions. The initial version of this project is mainly contributed by [LeiWang1999](https://github.com/LeiWang1999), [chengyupku](https://github.com/chengyupku) and [nox-410](https://github.com/nox-410). Part of this work was done during the internship at Microsoft Research, under the supervision of Dr. Lingxiao Ma, Dr. Yuqing Xia, Dr. Jilong Xue, and Dr. Fan Yang.
# Installation Guide
## Installing with pip
**Prerequisites for installation via wheel or PyPI:**
- **Operating System**: Ubuntu 20.04 or later
- **Python Version**: >= 3.8
- **CUDA Version**: >= 11.0
The easiest way to install TileLang is directly from the PyPi using pip. To install the latest version, run the following command in your terminal.
**Note**: Currently, TileLang whl is only supported on Ubuntu 20.04 or later version as we build the whl files on this platform. Currently we only provide whl files for CUDA>=11.0 and with Python>=3.8. **If you are using a different platform or environment, you may need to [build TileLang from source](https://github.com/tile-ai/tilelang/blob/main/docs/Installation.md#building-from-source).**
```bash
pip install tilelang
```
Alternatively, you may choose to install TileLang using prebuilt packages available on the Release Page:
```bash
pip install tilelang-0.0.0.dev0+ubuntu.20.4.cu120-py3-none-any.whl
```
To install the latest version of TileLang from the github repository, you can run the following command:
```bash
pip install git+https://github.com/tile-ai/tilelang.git
```
After installing TileLang, you can verify the installation by running:
```bash
python -c "import tilelang; print(tilelang.__version__)"
```
## Building from Source
**Prerequisites for building from source:**
- **Operating System**: Linux
- **Python Version**: >= 3.7
- **CUDA Version**: >= 10.0
We recommend using a docker container with the necessary dependencies to build TileLang from source. You can use the following command to run a docker container with the necessary dependencies:
```bash
docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.01-py3
```
To build and install TileLang directly from source, follow the steps below. This process requires certain pre-requisites from apache tvm, which can be installed on Ubuntu/Debian-based systems using the following commands:
```bash
sudo apt-get update
sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
```
After installing the prerequisites, you can clone the TileLang repository and install it using pip:
```bash
git clone --recursive https://github.com/tile-ai/tilelang.git
cd TileLang
pip install . # Please be patient, this may take some time.
```
if you want to install TileLang with the development mode, you can run the following command:
```bash
pip install -e .
```
We currently provide three ways to install **tile-lang**:
- [Install from Source (using your own TVM installation)](#install-from-source-with-your-own-tvm-installation)
- [Install from Source (using the bundled TVM submodule)](#install-from-source-with-our-tvm-submodule)
- [Install Using the Provided Script](#install-with-provided-script)
### Method 1: Install from Source (using your own TVM installation)
If you already have a compatible TVM installation, follow these steps:
1. **Clone the Repository:**
```bash
git clone --recursive https://github.com/tile-ai/tilelang
cd TileLang
```
> **Note**: Use the `--recursive` flag to include necessary submodules.
2. **Configure Build Options:**
Create a build directory and specify your existing TVM path:
```bash
mkdir build
cd build
cmake .. -DTVM_PREBUILD_PATH=/your/path/to/tvm/build # e.g., /workspace/tvm/build
make -j 16
```
3. **Set Environment Variables:**
Update `PYTHONPATH` to include the `tile-lang` Python module:
```bash
export PYTHONPATH=/your/path/to/tile-lang/python:$PYTHONPATH
# TVM_IMPORT_PYTHON_PATH is used by 3rdparty framework to import tvm
export TVM_IMPORT_PYTHON_PATH=/your/path/to/tvm/python
```
### Method 2: Install from Source (using the bundled TVM submodule)
If you prefer to use the built-in TVM version, follow these instructions:
1. **Clone the Repository:**
```bash
git clone --recursive https://github.com/tile-ai/tilelang
cd TileLang
```
> **Note**: Ensure the `--recursive` flag is included to fetch submodules.
2. **Configure Build Options:**
Copy the configuration file and enable the desired backends (e.g., LLVM and CUDA):
```bash
mkdir build
cp 3rdparty/tvm/cmake/config.cmake build
cd build
echo "set(USE_LLVM ON)" >> config.cmake
echo "set(USE_CUDA ON)" >> config.cmake
# or echo "set(USE_ROCM ON)" >> config.cmake if want to enable rocm runtime
cmake ..
make -j 16
```
The build outputs (e.g., `libtilelang.so`, `libtvm.so`, `libtvm_runtime.so`) will be generated in the `build` directory.
3. **Set Environment Variables:**
Ensure the `tile-lang` Python package is in your `PYTHONPATH`:
```bash
export PYTHONPATH=/your/path/to/TileLang/python:$PYTHONPATH
```
### Method 3: Install Using the Provided Script
For a simplified installation, use the provided script:
1. **Clone the Repository:**
```bash
git clone --recursive https://github.com/tile-ai/tilelang
cd TileLang
```
2. **Run the Installation Script:**
```bash
bash install.sh
# or bash `install_amd.sh` if you want to enable rocm runtime
```
This script automates the setup, including submodule initialization and configuration.
The flash-attention performance on RTX-4090 GPU, with cuda toolkit 12.2
SEQ_LEN is fixed to 2k, All matmul use fp16->fp32 mma, value in TFlops, higher is better.
Flash-Forward
| CASUAL,DIM | Flash_attn | Tvm.tl |
| --------- | ---------- | ------ |
| False, 32 | 159.79 | 156.82 |
| False, 64 | 168.91 | 166.84 |
| False, 128 | 169.28 | 166.51 |
| False, 256 | 156.15 | 166.77 |
| True, 32 | 126.78 | 142.59 |
| True, 64 | 142.23 | 152.43 |
| True, 128 | 151.19 | 156.30 |
| True, 256 | 144.12 | 151.54 |
Flash-backward
| CASUAL,DIM | Flash_attn | Tvm.tl |
| --------- | ---------- | ------ |
| False, 32 | 115.12 | 120.03 |
| False, 64 | 124.81 | 130.94 |
| False, 128 | 124.57 | 122.99 |
| True, 32 | 86.48 | 95.66 |
| True, 64 | 96.53 | 106.03 |
| True, 128 | 99.23 | 100.24 |
# TVM.TL language reference
## T.Kernel
args: the grid size (0-3 dimension) and the num_threads.
returns: the blockIdx variables
launch a kernel, it must be used in a with statement. There can be multiple kernels launched sequentially inside a prim function.
## T.alloc_shared
args: shape, dtype
returns: Buffer
Allocate buffer on shared memory, It must be used within T.Kernel scope and should be allocated at the top of the scope.
Dynamic shared memory is used.
## T.alloc_fragment
args: shape, dtype
returns: Buffer
Allocate buffer on register memory, It must be used within T.Kernel scope and should be allocated at the top of the scope.
The shape represents the whole shape of the buffer. Each element in the buffer is distributed stored on each threads, this storage partition will be inferred by the compiler.
## T.copy
args: src, dst
Copies data from src to dst, src and dst can be one of (Buffer, BufferLoad, BufferRegion). If you use BufferLoad that represents a single starting point, the other params should not be BufferLoad, since we need to know the copy region.
Zero will be padded if we detect the load is out of boundary.
## T.gemm
args: A, B, C, transpose_A, transpose_B, policy
Performs gemm operation on A, B and C. C must be a fragment, B must be on shared memory, A can be either a fragment or shared.
Note that the current implementation has some shape and dtype constraints, for example, the length of reduction axis must be a multiple of 32 for fp16 multiplicand case, we will update this later.
## T.reduce_max T.reduce_sum
args: src, dst, dim
Performs a reduce operation from src to dst on dimension dim. Currently we only support src and dst to be a fragment.
## T.Parallel
You can use T.Parallel to write a loop. The loop will be partitioned to all the threads by the compiler (The compiler will consider vectorize size, the fragment's thread mapping ... ). Note that this is the only way you can perform arbitrary operation on fragments.
## T.Pipelined
args: start, stop, num_stages
Pipeline the loop, copy from the global memory will be converted to async operations and reordered to the point after it is consumed. num_stages is the number of buffer between producer-consumer. (e.g. Double buffer when num_stages=2)
## T.clear T.fill
nothing special, they will be converted to T.Parallel
## T.use_swizzle
Optimization for L2 cache. The launch of blockIdx.x and blockIdx.y will be serpentined.
You need to add it in a kernel after buffer is all allocated.
......@@ -65,7 +65,7 @@ After installing the prerequisites, you can clone the TileLang repository and in
.. code:: bash
git clone --recursive https://github.com/tile-ai/tilelang.git
cd TileLang
cd tileLang
pip install . # Please be patient, this may take some time.
If you want to install TileLang in development mode, you can run the following command:
......@@ -76,11 +76,13 @@ If you want to install TileLang in development mode, you can run the following c
We currently provide three methods to install **TileLang**:
1. Install from Source (using your own TVM installation)
1. `Install from Source (using your own TVM installation)`_
2. `Install from Source (using the bundled TVM submodule)`_
3. `Install Using the Provided` Script_
2. Install from Source (using the bundled TVM submodule)
3. Install Using the Provided Script
.. _Install from Source (using your own TVM installation): #method-1-install-from-source-using-your-own-tvm-installation
.. _Install from Source (using the bundled TVM submodule): #method-2-install-from-source-using-the-bundled-tvm-submodule
.. _Install Using the Provided Script: #method-3-install-using-the-provided-script
Method 1: Install from Source (Using Your Own TVM Installation)
......@@ -93,7 +95,7 @@ If you already have a compatible TVM installation, follow these steps:
.. code:: bash
git clone --recursive https://github.com/tile-ai/tilelang
cd TileLang
cd tilelang
**Note**: Use the `--recursive` flag to include necessary submodules.
......@@ -114,7 +116,7 @@ If you already have a compatible TVM installation, follow these steps:
.. code:: bash
export PYTHONPATH=/your/path/to/tile-lang/python:$PYTHONPATH
export PYTHONPATH=/your/path/to/tilelang/:$PYTHONPATH
# TVM_IMPORT_PYTHON_PATH is used by 3rd-party frameworks to import TVM
export TVM_IMPORT_PYTHON_PATH=/your/path/to/tvm/python
......@@ -128,7 +130,7 @@ If you prefer to use the built-in TVM version, follow these instructions:
.. code:: bash
git clone --recursive https://github.com/tile-ai/tilelang
cd TileLang
cd tilelang
**Note**: Ensure the `--recursive` flag is included to fetch submodules.
......@@ -155,7 +157,7 @@ If you prefer to use the built-in TVM version, follow these instructions:
.. code:: bash
export PYTHONPATH=/your/path/to/TileLang/python:$PYTHONPATH
export PYTHONPATH=/your/path/to/tilelang/:$PYTHONPATH
Method 3: Install Using the Provided Script
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -167,11 +169,11 @@ For a simplified installation, use the provided script:
.. code:: bash
git clone --recursive https://github.com/tile-ai/tilelang
cd TileLang
cd tilelang
2. **Run the Installation Script**:
.. code:: bash
bash install.sh
bash install_cuda.sh
# or bash `install_amd.sh` if you want to enable ROCm runtime
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment