
## opus: AI (o)(p)erator Micro(u) (s)td
*Crafting the micro standard templates for AI Operators on ROCm*
## About
**opus** is a lightweight, templated C++ DSL designed to accelerate the development of HIP/C++ kernels for AMD GPUs. Inspired by projects such as [ck/ck_tile](https://github.com/ROCm/composable_kernel) and [cutlass/cute](https://github.com/NVIDIA/cutlass), **opus** adopts a significantly simplified design while prioritizing maintainability.
Distributed as a single-header library (`opus.hpp`), **opus** provides only essential abstractions. This constraint requires careful trade-offs when introducing new concepts. For instance, **opus** deliberately avoids a unified `tensor` class—which typically combines data providers (pointers or register arrays/tuples) with layout descriptors (for index calculation)—and instead separates them into two distinct classes. This design preserves the flexibility of manual index computation while maintaining clarity. As a result, **opus** positions itself **above hand-written HIP kernels** yet **below highly optimized template libraries like ck/cutlass**.
If you are looking for:
- AMDGPU data type declaration and conversion
- Automated vectorized buffer load/store dispatch (without manual implementation)
- Support for various matrix core instructions with minimal code changes when switching MFMA types
- A collection of utility device functions
- (Optional) Simple and intuitive layout abstractions to streamline index calculations
then **opus** is a good choice for you.
However, if you are looking for:
- Pre-optimized kernels (e.g., GEMM, attention, reduction) for direct use
- Reusable device-side pipelines for GEMM/attention/reduction
- A comprehensive layout system capable of describing arbitrary tensor transformations
then **opus** is not the right fit — you may be looking for alternatives like `ck` or `aiter` kernels.
## File structure
```
csrc/include/opus/
├── opus.hpp # Single-header library (all you need to include)
├── logo.png # Logo
└── README.md # This file
```
## Usage
Include the header in your HIP/C++ source:
```cpp
#include "opus/opus.hpp"
```
No separate build step is required — just make sure `csrc/include/` is on your include path.
## Design
The **opus** source code is structured into two logical sections within a single header file:
- The first half contains device-independent structures, containers, and utility functions (number, seq, array, tuple, layout, etc.)
- The second half includes architecture-specific device functions, such as buffer load/store operations and MFMA instructions
Below, we illustrate the usage of **opus** through a naive GEMM example.
### Naive GEMM using opus
#### 1. Vectorized load/store
Loading data from global memory can be as simple as pointer dereferencing:
```cpp
int offset_a = (threadIdx.x / 32 * 4) + (threadIdx.x % 32 * stride_a);
fp16x4_t v_a = *reinterpret_cast