---
id: ref-flaggems
repo: flagos-ai/FlagGems
title: FlagGems Triton Operator Library
url: https://github.com/flagos-ai/FlagGems
source_type: source-reference
source_category: open-triton-kernel-library
architectures:
- amd
- nvidia
- rocm
- dcu
tags:
- triton
- flaggems
- pytorch
- operator-library
- backend-neutral
- multi-backend
- aten
- normalization
- reduction
- elementwise
- quantization
techniques:
- pytorch-operator-replacement
- backend-neutral-triton
- test-matrix
- benchmark-matrix
- operator-coverage
hardware_features:
- wavefront
- lds
- vectorization
- cache
kernel_types:
- normalization
- reduction
- elementwise
- activation
- quantization
- gemm
languages:
- python
- triton
captured_at: '2026-05-26'
license: not-captured
source_paths:
- src/flag_gems
- benchmark
- tests
- modules_tests
- experimental_tests
- triton_src
- docs
- README.md
---
# FlagGems Triton Operator Library

- Repository: `flagos-ai/FlagGems`
- Source: [flagos-ai/FlagGems](https://github.com/flagos-ai/FlagGems)

## Route Fit

Use FlagGems when the Triton task is a PyTorch-style operator, normalization,
reduction, activation, elementwise fusion, or backend-neutral replacement. It is
less LLM-serving-specific than AITER or Conch, but it is valuable for portable
Triton operator structure, tests, and benchmark organization.

## What To Inspect

- `src/flag_gems` and `triton_src` for operator implementations.
- `tests`, `modules_tests`, and `experimental_tests` for dtype/shape coverage.
- `benchmark` for performance harness layout and comparison policy.

## DCU Use Notes

Treat FlagGems constants as hypotheses. Its portability makes it useful for
syntax and wrapper design, but final tuning still needs DCU profiler, IR/ISA,
and target-version proof.

## Query Hooks

```bash
python3 scripts/query.py "flaggems triton rmsnorm reduction" --type source-reference --compact
python3 scripts/query.py "flaggems triton pytorch operator" --type source-reference --compact
python3 scripts/get_page.py ref-flaggems
```