Unverified Commit 55293631 authored by Lei Wang's avatar Lei Wang Committed by GitHub
Browse files

[Bugfix] Expose alloc_reducer definition to the python side (#802)

- Introduced a new function `alloc_reducer` to allocate a reducer buffer with specified shape, data type, and reduction operation (sum, max, min).
- Added detailed documentation for the function, including usage instructions and parameter descriptions.
- Ensured that the function supports replication strategies and includes assertions for valid operation types and replication options.

This enhancement improves the functionality of buffer management in TileLang, facilitating efficient reduction operations in parallel loops.
parent 91a7bb2b
...@@ -41,6 +41,7 @@ from .allocate import ( ...@@ -41,6 +41,7 @@ from .allocate import (
alloc_shared, # noqa: F401 alloc_shared, # noqa: F401
alloc_fragment, # noqa: F401 alloc_fragment, # noqa: F401
alloc_barrier, # noqa: F401 alloc_barrier, # noqa: F401
alloc_reducer, # noqa: F401
) )
from .copy import copy, c2d_im2col # noqa: F401 from .copy import copy, c2d_im2col # noqa: F401
from .gemm import GemmWarpPolicy, gemm, gemm_v2 # noqa: F401 from .gemm import GemmWarpPolicy, gemm, gemm_v2 # noqa: F401
......
...@@ -87,3 +87,40 @@ def alloc_barrier(arrive_count: int): ...@@ -87,3 +87,40 @@ def alloc_barrier(arrive_count: int):
T.Buffer: A TVM buffer object allocated as a barrier T.Buffer: A TVM buffer object allocated as a barrier
""" """
return T.alloc_buffer([arrive_count], "uint64", scope="shared.barrier") return T.alloc_buffer([arrive_count], "uint64", scope="shared.barrier")
def alloc_reducer(shape, dtype, op="sum", replication=None):
"""
Allocate a reducer buffer.
Modifications needs to conform with `op`,
such as `op="sum"` requires `reducer[...] += ...` and
`op="max"` requires `reducer[...] = T.max(reducer[...], ...)`.
Only after T.fill with proper initializer the reduction may begin;
only after T.finalize_reducer the partial results will be available.
For `op="sum"`, filled value must be 0; for min and max, the filled initializer will become max or min clamper correspondingly.
You may want to use `T.max_value` for min and `T.min_value` for max.
Args:
shape (tuple): The shape of the buffer to allocate
dtype (str): The data type of the buffer (e.g., 'float32', 'int32')
op (str): The reduce operation corresponded with the reducer
replication (str | None): Replication strategy, can be "all" or "none". Defaults to not specified, and the compiler will do whatever it want.
Returns:
T.Buffer: A TVM buffer object allocated in thread-private storage, available to reduce values in T.Parallel loops.
"""
import tilelang.language as TL
assert op in ["sum", "max", "min"]
# TODO: support automatic layout
if replication is None:
replication = "none"
assert replication in ["all", "none"]
reducer = T.alloc_buffer(shape, dtype, scope="local.fragment")
TL.block_attr({"reducer_info": {reducer.data: {"rep": replication, "op": op}}})
return reducer
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment