rocm-aiter.md 2.23 KB
Newer Older
whlwhlwhl's avatar
whlwhlwhl committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
id: ref-rocm-aiter
repo: ROCm/aiter
title: AITER AI Tensor Engine for ROCm
url: https://github.com/ROCm/aiter
source_type: source-reference
source_category: open-triton-kernel-library
architectures:
- amd
- rocm
- dcu
tags:
- triton
- rocm
- aiter
- vllm
- sglang
- attention
- mla
- paged-attention
- fused-moe
- gemm
- rmsnorm
- quantization
- communication
techniques:
- backend-dispatch
- triton-kernel-reference
- triton-comms
- aiter-fallback-map
- rocm-first-validation
hardware_features:
- wavefront
- lds
- mfma
- mmac
- gfx
kernel_types:
- attention
- mla
- moe
- gemm
- normalization
- quantization
- communication
languages:
- python
- triton
- cpp
- hip
captured_at: '2026-05-26'
license: MIT
source_paths:
- aiter
- op_tests
- docs
- gradlib
- csrc
- requirements-triton-comms.txt
- .github/scripts/install_triton.sh
- README.md
---
# AITER AI Tensor Engine For ROCm

- Repository: `ROCm/aiter`
- Source: [ROCm/aiter](https://github.com/ROCm/aiter)
- License: `MIT`

## Route Fit

Use AITER as first-choice upstream evidence when optimizing Triton kernels that
compete with, wrap, or fall back to AITER paths in vLLM or SGLang. It is
especially useful for ROCm/DCU-facing dispatch, attention/MLA, fused MoE,
RMSNorm, quantized GEMM, and communication-related Triton kernels.

## What To Inspect

- Backend selection and fallback logic around AITER versus Triton.
- `op_tests` for shape coverage, tolerances, and reproducible correctness.
- Triton communication docs and install scripts when a task touches distributed
  overlap, all-reduce, or tensor/expert parallel serving.
- Kernel names and wrapper APIs that vLLM or SGLang may already recognize.

## DCU Use Notes

Treat AITER as ROCm-strong evidence, but still prove the exact local framework
path and the generated Triton kernel on the target DCU. Do not assume an AITER
kernel is selected unless backend logs, env flags, profiler names, or a direct
wrapper benchmark prove it.

## Query Hooks

```bash
python3 scripts/query.py "aiter triton mla rocm" --type source-reference --compact
python3 scripts/query.py "aiter fused moe triton" --type source-reference --compact
python3 scripts/query.py "aiter triton comms allreduce" --type source-reference --compact
python3 scripts/get_page.py ref-rocm-aiter
```