stackav-conch.md 2.16 KB
Newer Older
whlwhlwhl's avatar
whlwhlwhl committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
id: ref-stackav-conch
repo: stackav-oss/conch
title: Conch Triton Kernel Standard Library
url: https://github.com/stackav-oss/conch
source_type: source-reference
source_category: open-triton-kernel-library
architectures:
- amd
- rocm
- nvidia
- dcu
tags:
- triton
- conch
- standard-library
- rocm
- paged-attention
- varlen-attention
- rmsnorm
- rotary
- kv-cache
- fp8
- int8
- quantization
- vllm
techniques:
- pytorch-reference
- microbenchmark
- unit-test
- direct-file-harness
- kernel-wrapper-pattern
hardware_features:
- wavefront
- lds
- mfma
- cache
kernel_types:
- attention
- paged-attention
- normalization
- rotary
- quantization
- kv-cache
languages:
- python
- triton
captured_at: '2026-05-26'
license: Apache-2.0
source_paths:
- conch
- tests
- benchmarks
- README.md
- pyproject.toml
---
# Conch Triton Kernel Standard Library

- Repository: `stackav-oss/conch`
- Source: [stackav-oss/conch](https://github.com/stackav-oss/conch)
- Package: [conch-triton-kernels](https://pypi.org/project/conch-triton-kernels/)
- License: `Apache-2.0`

## Route Fit

Use Conch as a high-quality open Triton kernel reference for direct-file mode
and vLLM-adjacent serving kernels. It is useful when the task needs a PyTorch
reference, unit test, microbenchmark, launch wrapper, or standalone Triton file
that can be adapted into `.humanize/triton-agent/` harnesses.

## What To Inspect

- Paged attention, varlen attention, rotary, RMSNorm, KV-cache, and quantized
  utility kernels.
- `tests` for correctness tolerances and edge cases.
- `benchmarks` for warmed timing and direct wrapper invocation patterns.

## DCU Use Notes

Conch is useful because it is closer to direct-file Triton work than framework
backend code. Still verify ROCm/DCU behavior with local profiler names, cache
entries, and target `gcnArchName`; do not copy tuning constants blindly.

## Query Hooks

```bash
python3 scripts/query.py "conch triton paged attention" --type source-reference --compact
python3 scripts/query.py "conch rmsnorm rotary fp8" --type source-reference --compact
python3 scripts/query.py "conch direct file harness triton" --type source-reference --compact
python3 scripts/get_page.py ref-stackav-conch
```