--- id: ref-triton-distributed repo: ByteDance-Seed/Triton-distributed title: Triton-distributed url: https://github.com/ByteDance-Seed/Triton-distributed source_type: source-reference source_category: open-triton-kernel-library architectures: - amd - nvidia - rocm - dcu tags: - triton - distributed - communication-overlap - allreduce - allgather - reduce-scatter - gemm - moe - flash-decode - amd - nvidia techniques: - compute-communication-overlap - gemm-allreduce - allgather-gemm - reduce-scatter-overlap - distributed-kernel hardware_features: - wavefront - lds - mfma - interconnect kernel_types: - gemm - attention - moe - communication languages: - python - triton - cpp captured_at: '2026-05-26' license: not-captured source_paths: - python - lib - include - csrc - docs - tests - README.md --- # Triton-distributed - Repository: `ByteDance-Seed/Triton-distributed` - Source: [ByteDance-Seed/Triton-distributed](https://github.com/ByteDance-Seed/Triton-distributed) - Docs: [Triton-distributed kernels](https://triton-distributed.readthedocs.io/en/latest/kernels/index.html) ## Route Fit Use Triton-distributed when the optimization touches tensor parallelism, expert parallelism, MoE communication, GEMM + all-reduce, all-gather + GEMM, reduce-scatter overlap, or distributed flash decode. It is not the first source for single-kernel tuning, but it is a strong reference for overlap-aware Triton design. ## What To Inspect - Distributed kernel examples and docs for communication overlap patterns. - Tests for shape and process-group assumptions. - Backend support notes; separate AMD-compatible ideas from NVIDIA-only paths. ## DCU Use Notes For DCU, prove the communication backend, process topology, and profiler kernel presence before reusing overlap patterns. Treat NVIDIA-specific launch or interconnect assumptions as cross-platform inspiration only. ## Query Hooks ```bash python3 scripts/query.py "triton distributed allreduce gemm" --type source-reference --compact python3 scripts/query.py "triton distributed moe reduce scatter" --type source-reference --compact python3 scripts/get_page.py ref-triton-distributed ```