--- id: ref-flaggems repo: flagos-ai/FlagGems title: FlagGems Triton Operator Library url: https://github.com/flagos-ai/FlagGems source_type: source-reference source_category: open-triton-kernel-library architectures: - amd - nvidia - rocm - dcu tags: - triton - flaggems - pytorch - operator-library - backend-neutral - multi-backend - aten - normalization - reduction - elementwise - quantization techniques: - pytorch-operator-replacement - backend-neutral-triton - test-matrix - benchmark-matrix - operator-coverage hardware_features: - wavefront - lds - vectorization - cache kernel_types: - normalization - reduction - elementwise - activation - quantization - gemm languages: - python - triton captured_at: '2026-05-26' license: not-captured source_paths: - src/flag_gems - benchmark - tests - modules_tests - experimental_tests - triton_src - docs - README.md --- # FlagGems Triton Operator Library - Repository: `flagos-ai/FlagGems` - Source: [flagos-ai/FlagGems](https://github.com/flagos-ai/FlagGems) ## Route Fit Use FlagGems when the Triton task is a PyTorch-style operator, normalization, reduction, activation, elementwise fusion, or backend-neutral replacement. It is less LLM-serving-specific than AITER or Conch, but it is valuable for portable Triton operator structure, tests, and benchmark organization. ## What To Inspect - `src/flag_gems` and `triton_src` for operator implementations. - `tests`, `modules_tests`, and `experimental_tests` for dtype/shape coverage. - `benchmark` for performance harness layout and comparison policy. ## DCU Use Notes Treat FlagGems constants as hypotheses. Its portability makes it useful for syntax and wrapper design, but final tuning still needs DCU profiler, IR/ISA, and target-version proof. ## Query Hooks ```bash python3 scripts/query.py "flaggems triton rmsnorm reduction" --type source-reference --compact python3 scripts/query.py "flaggems triton pytorch operator" --type source-reference --compact python3 scripts/get_page.py ref-flaggems ```