triton-distributed.md

---
id: ref-triton-distributed
repo: ByteDance-Seed/Triton-distributed
title: Triton-distributed
url: https://github.com/ByteDance-Seed/Triton-distributed
source_type: source-reference
source_category: open-triton-kernel-library
architectures:
- amd
- nvidia
- rocm
- dcu
tags:
- triton
- distributed
- communication-overlap
- allreduce
- allgather
- reduce-scatter
- gemm
- moe
- flash-decode
- amd
- nvidia
techniques:
- compute-communication-overlap
- gemm-allreduce
- allgather-gemm
- reduce-scatter-overlap
- distributed-kernel
hardware_features:
- wavefront
- lds
- mfma
- interconnect
kernel_types:
- gemm
- attention
- moe
- communication
languages:
- python
- triton
- cpp
captured_at: '2026-05-26'
license: not-captured
source_paths:
- python
- lib
- include
- csrc
- docs
- tests
- README.md
---
# Triton-distributed

- Repository: `ByteDance-Seed/Triton-distributed`
- Source: [ByteDance-Seed/Triton-distributed](https://github.com/ByteDance-Seed/Triton-distributed)
- Docs: [Triton-distributed kernels](https://triton-distributed.readthedocs.io/en/latest/kernels/index.html)

## Route Fit

Use Triton-distributed when the optimization touches tensor parallelism, expert
parallelism, MoE communication, GEMM + all-reduce, all-gather + GEMM,
reduce-scatter overlap, or distributed flash decode. It is not the first source
for single-kernel tuning, but it is a strong reference for overlap-aware Triton
design.

## What To Inspect

- Distributed kernel examples and docs for communication overlap patterns.
- Tests for shape and process-group assumptions.
- Backend support notes; separate AMD-compatible ideas from NVIDIA-only paths.

## DCU Use Notes

For DCU, prove the communication backend, process topology, and profiler kernel
presence before reusing overlap patterns. Treat NVIDIA-specific launch or
interconnect assumptions as cross-platform inspiration only.

## Query Hooks

```bash
python3 scripts/query.py "triton distributed allreduce gemm" --type source-reference --compact
python3 scripts/query.py "triton distributed moe reduce scatter" --type source-reference --compact
python3 scripts/get_page.py ref-triton-distributed
```