Unverified Commit 42c267e8 authored by Lei Wang's avatar Lei Wang Committed by GitHub
Browse files

[Target] Enhance target selection helpers and documentation (#1085)

* Improve target docs and helper messaging

  Commit Message:

  - add SUPPORTED_TARGETS metadata and expose describe_supported_targets()
  - relax target validation to accept option suffixes and upgrade error messages
  - document target usage and compute capability mapping in docs/get_started/targets.md
  - note preference for string targets when caching and link the new guide in docs/index.md

* remove american english spelling
parent 60e9c7e6
# Understanding Targets
TileLang is built on top of TVM, which relies on **targets** to describe the device you want to compile for.
The target determines which code generator is used (CUDA, HIP, Metal, LLVM, …) and allows you to pass
device-specific options such as GPU architecture flags. This page summarises how to pick and customise a target
when compiling TileLang programs.
## Common target strings
TileLang ships with a small set of common targets; each accepts the full range of TVM options so you can fine-tune
the generated code. The most frequent choices are listed below:
| Base name | Description |
| --------- | ----------- |
| `auto` | Detects CUDA → HIP → Metal in that order. Useful when running the same script across machines. |
| `cuda` | NVIDIA GPUs. Supports options such as `-arch=sm_80`, `-max_num_threads=1024`, etc. |
| `hip` | AMD GPUs via ROCm. Options like `-mcpu=gfx90a` can be appended. |
| `metal` | Apple Silicon GPUs (arm64 Macs). |
| `llvm` | CPU execution; accepts the standard TVM LLVM switches. |
| `webgpu` | Browser / WebGPU runtimes. |
| `c` | Emit plain C source for inspection or custom toolchains. |
To add options, append them after the base name, separated by spaces. For example:
```python
target = "cuda -arch=sm_90"
kernel = tilelang.compile(func, target=target, execution_backend="cython")
# or
@tilelang.jit(target=target)
def compiled_kernel(*args):
return func(*args)
```
The same convention works for HIP or LLVM (e.g. `hip -mcpu=gfx940`, `llvm -mtriple=x86_64-linux-gnu`).
### Advanced: Specify Exact Hardware
When you already know the precise GPU model, you can encode it in the target string—either via `-arch=sm_XX` or by
using one of TVM’s pre-defined target tags such as `nvidia/nvidia-h100`. Supplying this detail is optional for
TileLang in general use, but it becomes valuable when the TVM cost model is enabled (e.g. during autotuning). The
cost model uses the extra attributes to make better scheduling predictions. If you skip this step (or do not use the
cost model), generic targets like `cuda` or `auto` are perfectly fine.
All CUDA compute capabilities recognised by TVM’s target registry are listed below. Pick the one that matches your
GPU and append it to the target string or use the corresponding target tag—for example `nvidia/nvidia-a100`.
| Architecture | GPUs (examples) |
| ------------ | ---------------- |
| `sm_20` | `nvidia/tesla-c2050`, `nvidia/tesla-c2070` |
| `sm_21` | `nvidia/nvs-5400m`, `nvidia/geforce-gt-520` |
| `sm_30` | `nvidia/quadro-k5000`, `nvidia/geforce-gtx-780m` |
| `sm_35` | `nvidia/tesla-k40`, `nvidia/quadro-k6000` |
| `sm_37` | `nvidia/tesla-k80` |
| `sm_50` | `nvidia/quadro-k2200`, `nvidia/geforce-gtx-950m` |
| `sm_52` | `nvidia/tesla-m40`, `nvidia/geforce-gtx-980` |
| `sm_53` | `nvidia/jetson-tx1`, `nvidia/jetson-nano` |
| `sm_60` | `nvidia/tesla-p100`, `nvidia/quadro-gp100` |
| `sm_61` | `nvidia/tesla-p4`, `nvidia/quadro-p6000`, `nvidia/geforce-gtx-1080` |
| `sm_62` | `nvidia/jetson-tx2` |
| `sm_70` | `nvidia/nvidia-v100`, `nvidia/quadro-gv100` |
| `sm_72` | `nvidia/jetson-agx-xavier` |
| `sm_75` | `nvidia/nvidia-t4`, `nvidia/quadro-rtx-8000`, `nvidia/geforce-rtx-2080` |
| `sm_80` | `nvidia/nvidia-a100`, `nvidia/nvidia-a30` |
| `sm_86` | `nvidia/nvidia-a40`, `nvidia/nvidia-a10`, `nvidia/geforce-rtx-3090` |
| `sm_87` | `nvidia/jetson-agx-orin-32gb`, `nvidia/jetson-agx-orin-64gb` |
| `sm_89` | `nvidia/geforce-rtx-4090` |
| `sm_90a` | `nvidia/nvidia-h100` (DPX profile) |
| `sm_100a` | `nvidia/nvidia-b100` |
Refer to NVIDIA’s [CUDA GPUs](https://developer.nvidia.com/cuda-gpus) page or the TVM source
(`3rdparty/tvm/src/target/tag.cc`) for the latest mapping between devices and compute capabilities.
## Creating targets programmatically
If you prefer working with TVM’s `Target` objects, TileLang exposes the helper
`tilelang.utils.target.determine_target` (returns a canonical target string by default, or the `Target`
object when `return_object=True`):
```python
from tilelang.utils.target import determine_target
tvm_target = determine_target("cuda -arch=sm_80", return_object=True)
kernel = tilelang.compile(func, target=tvm_target)
```
You can also build targets directly through TVM:
```python
from tvm.target import Target
target = Target("cuda", host="llvm")
target = target.with_host(Target("llvm -mcpu=skylake"))
```
TileLang accepts either `str` or `Target` inputs; internally they are normalised and cached using the canonical
string representation. **In user code we strongly recommend passing target strings rather than
`tvm.target.Target` instances—strings keep cache keys compact and deterministic across runs, whereas constructing
fresh `Target` objects may lead to slightly higher hashing overhead or inconsistent identity semantics.**
## Discovering supported targets in code
Looking for a quick reminder of the built-in base names and their descriptions? Use:
```python
from tilelang.utils.target import describe_supported_targets
for name, doc in describe_supported_targets().items():
print(f"{name:>6}: {doc}")
```
This helper mirrors the table above and is safe to call at runtime (for example when validating CLI arguments).
## Troubleshooting tips
- If you see `Target cuda -arch=sm_80 is not supported`, double-check the spellings and that the option is valid for
TVM. Any invalid switch will surface as a target-construction error.
- Runtime errors such as “no kernel image is available” usually mean the `-arch` flag does not match the GPU you are
running on. Try dropping the flag or switching to the correct compute capability.
- When targeting multiple environments, use `auto` for convenience and override with an explicit string only when
you need architecture-specific tuning.
...@@ -14,6 +14,7 @@ low-level optimizations necessary for state-of-the-art performance. ...@@ -14,6 +14,7 @@ low-level optimizations necessary for state-of-the-art performance.
get_started/Installation get_started/Installation
get_started/overview get_started/overview
get_started/targets
::: :::
......
...@@ -74,7 +74,6 @@ column_limit = 100 ...@@ -74,7 +74,6 @@ column_limit = 100
indent_width = 4 indent_width = 4
[tool.codespell] [tool.codespell]
builtin = "clear,rare,en-GB_to_en-US"
ignore-words = "docs/spelling_wordlist.txt" ignore-words = "docs/spelling_wordlist.txt"
skip = [ skip = [
"build", "build",
......
...@@ -11,7 +11,7 @@ from tilelang.engine.param import CompiledArtifact, KernelParam ...@@ -11,7 +11,7 @@ from tilelang.engine.param import CompiledArtifact, KernelParam
from tilelang.jit.adapter import (BaseKernelAdapter, CtypesKernelAdapter, CythonKernelAdapter, from tilelang.jit.adapter import (BaseKernelAdapter, CtypesKernelAdapter, CythonKernelAdapter,
NVRTCKernelAdapter, TorchDLPackKernelAdapter, MetalKernelAdapter) NVRTCKernelAdapter, TorchDLPackKernelAdapter, MetalKernelAdapter)
from tilelang.profiler import Profiler, TensorSupplyType from tilelang.profiler import Profiler, TensorSupplyType
from tilelang.utils.target import AVALIABLE_TARGETS, determine_target from tilelang.utils.target import determine_target
import logging import logging
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
...@@ -90,13 +90,8 @@ class JITKernel(object): ...@@ -90,13 +90,8 @@ class JITKernel(object):
self.compile_flags = compile_flags self.compile_flags = compile_flags
# If the target is specified as a string, validate it and convert it to a TVM Target. # Ensure the target is always a valid TVM Target object.
if isinstance(target, str): self.target = determine_target(target, return_object=True)
assert target in AVALIABLE_TARGETS, f"Invalid target: {target}"
target = determine_target(target)
# Ensure the target is always a TVM Target object.
self.target = Target(target)
# Validate the execution backend. # Validate the execution backend.
assert execution_backend in [ assert execution_backend in [
......
from platform import mac_ver from platform import mac_ver
from typing import Literal, Union from typing import Dict, Literal, Union
from tilelang import tvm as tvm from tilelang import tvm as tvm
from tilelang import _ffi_api from tilelang import _ffi_api
from tvm.target import Target from tvm.target import Target
from tvm.contrib import rocm from tvm.contrib import rocm
from tilelang.contrib import nvcc from tilelang.contrib import nvcc
AVALIABLE_TARGETS = { SUPPORTED_TARGETS: Dict[str, str] = {
"auto", "auto": "Auto-detect CUDA/HIP/Metal based on availability.",
"cuda", "cuda": "CUDA GPU target (supports options such as `cuda -arch=sm_80`).",
"hip", "hip": "ROCm HIP target (supports options like `hip -mcpu=gfx90a`).",
"webgpu", "metal": "Apple Metal target for arm64 Macs.",
"c", # represent c source backend "llvm": "LLVM CPU target (accepts standard TVM LLVM options).",
"llvm", "webgpu": "WebGPU target for browser/WebGPU runtimes.",
"metal", "c": "C source backend.",
} }
def describe_supported_targets() -> Dict[str, str]:
"""
Return a mapping of supported target names to usage descriptions.
"""
return dict(SUPPORTED_TARGETS)
def check_cuda_availability() -> bool: def check_cuda_availability() -> bool:
""" """
Check if CUDA is available on the system by locating the CUDA path. Check if CUDA is available on the system by locating the CUDA path.
...@@ -90,11 +97,27 @@ def determine_target(target: Union[str, Target, Literal["auto"]] = "auto", ...@@ -90,11 +97,27 @@ def determine_target(target: Union[str, Target, Literal["auto"]] = "auto",
raise ValueError("No CUDA or HIP or MPS available on this system.") raise ValueError("No CUDA or HIP or MPS available on this system.")
else: else:
# Validate the target if it's not "auto" # Validate the target if it's not "auto"
assert isinstance( if isinstance(target, Target):
target, Target) or target in AVALIABLE_TARGETS, f"Target {target} is not supported" return_var = target
return_var = target elif isinstance(target, str):
normalized_target = target.strip()
if not normalized_target:
raise AssertionError(f"Target {target} is not supported")
try:
Target(normalized_target)
except Exception as err:
examples = ", ".join(f"`{name}`" for name in SUPPORTED_TARGETS)
raise AssertionError(
f"Target {target} is not supported. Supported targets include: {examples}. "
"Pass additional options after the base name, e.g. `cuda -arch=sm_80`."
) from err
return_var = normalized_target
else:
raise AssertionError(f"Target {target} is not supported")
if return_object: if return_object:
if isinstance(return_var, Target):
return return_var
return Target(return_var) return Target(return_var)
return return_var return return_var
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment