Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
whlwhlwhl
Lightop-SKIILS
Commits
6889486d
Commit
6889486d
authored
May 22, 2026
by
whlwhlwhl
Browse files
添加开发规范
parent
73900188
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
84 additions
and
11 deletions
+84
-11
docs/lightop-skills.zh-CN.md
docs/lightop-skills.zh-CN.md
+22
-2
humanize/skills/humanize-kernel-agent-loop/SKILL.md
humanize/skills/humanize-kernel-agent-loop/SKILL.md
+62
-9
No files found.
docs/lightop-skills.zh-CN.md
View file @
6889486d
...
@@ -104,6 +104,17 @@ test/<family>/*benchmark*.py
...
@@ -104,6 +104,17 @@ test/<family>/*benchmark*.py
lightop/config*.py
lightop/config*.py
```
```
LightOp 算子源码使用
`.cu`
。不要手写或提交
`.hip`
算子源码文件;如果编译过程自动生成
`.hip`
,它只是 build artifact,不是本次应该落库的源码。
所有改动都要符合 LightOp 现有开发规范。agent 写代码前必须先找最近的同 family
实现作为参照,沿用它的目录、文件命名、C++ namespace/include/launch helper、
wrapper 参数校验、
`export.cpp`
binding、config/dispatcher、test 和 benchmark 风格。
不要引入无关依赖、外部项目目录结构、批量格式化、生成源码、无关 operator family
改动,除非用户明确要求并在计划里说明原因。交付前需要做一次规范自检:列出修改文件
对应参考了哪些 LightOp 本地文件,说明任何有意偏离,并确认没有手写
`.hip`
源码、
没有无关改动、
`test/`
下没有多个最终任务测试入口。
新增 fused 算子时,agent 应先搜索拆分前的单算子和相关 fused 实现,用本地 LightOp
新增 fused 算子时,agent 应先搜索拆分前的单算子和相关 fused 实现,用本地 LightOp
路径作为 API、校验、benchmark 和性能预期的主要基准。找不到本地基准时,再记录搜索
路径作为 API、校验、benchmark 和性能预期的主要基准。找不到本地基准时,再记录搜索
结果为空,并回退到 PyTorch 或字面 oracle。
结果为空,并回退到 PyTorch 或字面 oracle。
...
@@ -120,7 +131,9 @@ lightop/config*.py
...
@@ -120,7 +131,9 @@ lightop/config*.py
-
需要 shape/gfx 特化时,才改
`lightop/config*.py`
或 dispatch 表。
-
需要 shape/gfx 特化时,才改
`lightop/config*.py`
或 dispatch 表。
-
最终测试脚本必须放在
`test/`
下,命名为
`test_<算子名>.py`
。不要只把
-
最终测试脚本必须放在
`test/`
下,命名为
`test_<算子名>.py`
。不要只把
`final_test.py`
、
`bench_baseline.py`
等脚本放在
`.humanize/lightop-agent/`
里;
`final_test.py`
、
`bench_baseline.py`
等脚本放在
`.humanize/lightop-agent/`
里;
`.humanize`
只用于过程记录和临时证据,不是 LightOp 最终测试入口。
`.humanize`
只用于过程记录和临时证据,不是 LightOp 最终测试入口。每个任务在
`test/`
下只新增或使用这一个正式单测入口,候选测试、benchmark harness、sweep 配置、
PMC 解析脚本等辅助材料都放
`.humanize/lightop-agent/`
。
### 已有算子优化 Checklist
### 已有算子优化 Checklist
...
@@ -136,12 +149,15 @@ lightop/config*.py
...
@@ -136,12 +149,15 @@ lightop/config*.py
-
直接在指定环境里 install、test、benchmark、profile、tune。
-
直接在指定环境里 install、test、benchmark、profile、tune。
-
已有算子优化任务里,用户只需要指定目标算子和测试文件;如果用户给了测试文件,就以
-
已有算子优化任务里,用户只需要指定目标算子和测试文件;如果用户给了测试文件,就以
该文件作为最终验证文件。如果没有给,就推断或创建
`test/test_<算子名>.py`
。
该文件作为最终验证文件。如果没有给,就推断或创建
`test/test_<算子名>.py`
。
不要在
`test/`
下额外堆多个任务脚本;其它辅助脚本放
`.humanize/lightop-agent/`
。
-
如果某条优化线回退或只对非目标 shape 有帮助,要记录 reject 原因。
-
如果某条优化线回退或只对非目标 shape 有帮助,要记录 reject 原因。
### DCU/ROCm 默认规则
### DCU/ROCm 默认规则
-
在 LightOp 测试里,把 PyTorch 的
`torch.cuda`
namespace 视作 ROCm runtime facade。
-
在 LightOp 测试里,把 PyTorch 的
`torch.cuda`
namespace 视作 ROCm runtime facade。
-
优先用
`hipcc`
和 ROCm extension build。
-
优先用
`hipcc`
和 ROCm extension build。
-
LightOp kernel 源码落
`.cu`
,不要生成或提交手写
`.hip`
算子文件;
`.hip`
如果出现,
视作编译自动生成产物。
-
尊重
`PYTORCH_ROCM_ARCH`
;没设置时从
`gcnArchName`
推断。
-
尊重
`PYTORCH_ROCM_ARCH`
;没设置时从
`gcnArchName`
推断。
-
不引入 CUDA-only header、PTX/SASS、CUTLASS/CuTe、Nsight Compute、TMA/WGMMA 等
-
不引入 CUDA-only header、PTX/SASS、CUTLASS/CuTe、Nsight Compute、TMA/WGMMA 等
NVIDIA 专用假设。
NVIDIA 专用假设。
...
@@ -283,7 +299,9 @@ docker exec <container> bash -lc 'cd <container-lightop> && PYTORCH_ROCM_ARCH="g
...
@@ -283,7 +299,9 @@ docker exec <container> bash -lc 'cd <container-lightop> && PYTORCH_ROCM_ARCH="g
```
```
无论 PyTorch 版本是什么,都不切到
`setup_torch29.py`
。正常调优循环里不删除
无论 PyTorch 版本是什么,都不切到
`setup_torch29.py`
。正常调优循环里不删除
`build/`
,这样能复用增量编译结果。只有用户明确要求 clean build,或者证明 build
`build/`
,也不删除、移动、重建
`build/bdist.*`
、
`build/lib.*`
、
`build/temp.*`
这些
`python setup.py install`
正常生成的子目录。它们就是增量编译缓存和安装产物,
要保留复用,避免每轮重新全量编译。只有用户明确要求 clean build,或者证明 build
cache 损坏时,才清理。
cache 损坏时,才清理。
也就是说,普通 LightOp 调优任务里不要执行
`rm -rf build`
,不要执行
也就是说,普通 LightOp 调优任务里不要执行
`rm -rf build`
,不要执行
...
@@ -312,6 +330,8 @@ python test_<op>.py
...
@@ -312,6 +330,8 @@ python test_<op>.py
最终验证文件必须在
`test/`
下。新增算子必须新增
`test/test_<算子名>.py`
;优化已有算子
最终验证文件必须在
`test/`
下。新增算子必须新增
`test/test_<算子名>.py`
;优化已有算子
时使用用户指定的测试文件,没有指定时再推断或创建
`test/test_<算子名>.py`
。
时使用用户指定的测试文件,没有指定时再推断或创建
`test/test_<算子名>.py`
。
`test/`
下只保留这一个正式任务测试入口;临时 candidate 测试、独立 benchmark、PMC
解析、shape sweep、baseline 探索脚本都放
`.humanize/lightop-agent/`
。
测试脚本必须先做精度验证,再做性能测试。精度基准可以是 PyTorch、Triton、已有
测试脚本必须先做精度验证,再做性能测试。精度基准可以是 PyTorch、Triton、已有
LightOp 组合路径,或者用户提示里指定的 oracle。性能测试必须包含 10 轮 warmup 和
LightOp 组合路径,或者用户提示里指定的 oracle。性能测试必须包含 10 轮 warmup 和
...
...
humanize/skills/humanize-kernel-agent-loop/SKILL.md
View file @
6889486d
...
@@ -134,6 +134,37 @@ test/<family>/*benchmark*.py
...
@@ -134,6 +134,37 @@ test/<family>/*benchmark*.py
lightop/config*.py
lightop/config*.py
```
```
Use
`.cu`
for LightOp HIP/C++ kernel source files. Do not create or edit
hand-written
`.hip`
source files for LightOp operators;
`.hip`
files are treated
as build-generated artifacts when the local build produces them, not as the
source format to land in the repository.
LightOp development-conformance gate:
-
Before editing, inspect the nearest same-family LightOp implementation and
mirror its directory layout, file naming, C++ namespace style, include order,
launch helper style, macro/dispatch pattern, error-checking idioms, and
formatting.
-
Match the existing Python wrapper style for imports, tensor/device/dtype
checks, contiguity handling, optional arguments, return shape, and fallback or
error behavior. Do not invent a new user-facing wrapper pattern when a local
one exists.
-
Match the existing
`lightop/csrc/export.cpp`
binding style, including symbol
names, argument order,
`m.def(...)`
placement, and public API exposure policy.
-
Match existing
`test/`
style for seeds, tolerances, dtype/shape coverage,
synchronization, skip logic, and command-line behavior. Do not introduce a new
test framework or many one-off test entrypoints for a single task.
-
Match existing benchmark/config conventions before adding new tables,
dispatcher branches, shape keys, environment variables, or benchmark CLI
flags.
-
Do not add unrelated dependencies, vendored kernels, generated source files,
broad refactors, formatting churn, or foreign project layout unless the user
explicitly asks and the reason is recorded in the plan.
-
Before final reporting, run a conformance review: list changed files, the
nearest LightOp files they follow, any intentional deviations, and confirm
there are no hand-written
`.hip`
sources, unrelated operator-family edits, or
extra final tests under
`test/`
.
Typical add-operator checklist:
Typical add-operator checklist:
-
For fused operators, inspect each component single-op wrapper, binding,
-
For fused operators, inspect each component single-op wrapper, binding,
...
@@ -156,9 +187,12 @@ Typical add-operator checklist:
...
@@ -156,9 +187,12 @@ Typical add-operator checklist:
relevant config/dispatcher table under
`lightop/config*.py`
.
relevant config/dispatcher table under
`lightop/config*.py`
.
-
Add the final focused correctness/performance test script under
`test/`
,
-
Add the final focused correctness/performance test script under
`test/`
,
named
`test_<operator_name>.py`
after the public operator or target kernel
named
`test_<operator_name>.py`
after the public operator or target kernel
name. Do not leave final validation scripts only under
name. Create only this one final task test under
`test/`
; put auxiliary
`.humanize/lightop-agent/`
; that directory is for records and temporary
benchmark harnesses, candidate tests, sweep configs, PMC parsers, temporary
evidence, not the LightOp test surface.
reference scripts, and other helper scripts under
`.humanize/lightop-agent/`
.
Do not leave final validation scripts only under
`.humanize/lightop-agent/`
;
that directory is for records and temporary evidence, not the LightOp test
surface.
Optimization-only checklist:
Optimization-only checklist:
...
@@ -180,7 +214,9 @@ Optimization-only checklist:
...
@@ -180,7 +214,9 @@ Optimization-only checklist:
-
For an existing-operator optimization task, the user only needs to specify
-
For an existing-operator optimization task, the user only needs to specify
the target operator and test file. Use that test file as the required final
the target operator and test file. Use that test file as the required final
validation file. If the user does not specify one, infer or create
validation file. If the user does not specify one, infer or create
`test/test_<operator_name>.py`
.
`test/test_<operator_name>.py`
. Do not add extra task-specific test files
under
`test/`
; place helper benchmarks, parse scripts, and sweep scripts
under
`.humanize/lightop-agent/`
.
-
Record rejected lineages when an optimization regresses or only helps a
-
Record rejected lineages when an optimization regresses or only helps a
non-target shape.
non-target shape.
...
@@ -189,6 +225,9 @@ Optimization-only checklist:
...
@@ -189,6 +225,9 @@ Optimization-only checklist:
-
Treat PyTorch's
`torch.cuda`
namespace as the ROCm runtime facade when used
-
Treat PyTorch's
`torch.cuda`
namespace as the ROCm runtime facade when used
by LightOp tests.
by LightOp tests.
-
Prefer
`hipcc`
/ROCm extension builds over NVIDIA-only compile paths.
-
Prefer
`hipcc`
/ROCm extension builds over NVIDIA-only compile paths.
-
Use
`.cu`
as the checked-in LightOp kernel source extension. Do not create
checked-in
`.hip`
operator files; if the build generates
`.hip`
files, leave
them as build artifacts and do not treat them as source edits.
-
Respect
`PYTORCH_ROCM_ARCH`
; if unset, derive gfx from
-
Respect
`PYTORCH_ROCM_ARCH`
; if unset, derive gfx from
`torch.cuda.get_device_properties(0).gcnArchName`
.
`torch.cuda.get_device_properties(0).gcnArchName`
.
-
Prefer
`gfx928;gfx936;gfx938`
when the local LightOp setup already uses that
-
Prefer
`gfx928;gfx936;gfx938`
when the local LightOp setup already uses that
...
@@ -452,9 +491,13 @@ docker exec <container> bash -lc 'cd <container-lightop> && PYTORCH_ROCM_ARCH="g
...
@@ -452,9 +491,13 @@ docker exec <container> bash -lc 'cd <container-lightop> && PYTORCH_ROCM_ARCH="g
```
```
Keep the existing
`build/`
directory between attempts so incremental extension
Keep the existing
`build/`
directory between attempts so incremental extension
builds can reuse prior compilation output. Do not delete
`build/`
as part of
builds can reuse prior compilation output.
`python setup.py install`
may create
the normal build/test/tune loop unless the user explicitly requests a clean
or update normal setuptools subdirectories such as
`build/bdist.*`
,
build or the build cache is proven to be stale or corrupt.
`build/lib.*`
, and
`build/temp.*`
; leave them in place and reuse them on the
next build. Do not delete, rename, move, or recreate
`build/`
or its
subdirectories as part of the normal build/test/tune loop unless the user
explicitly requests a clean build or the build cache is proven to be stale or
corrupt.
Commands such as
`rm -rf build`
,
`python setup_torch29.py install`
, or a host
Commands such as
`rm -rf build`
,
`python setup_torch29.py install`
, or a host
side
`python setup.py install`
for a Docker-bound task violate this skill unless
side
`python setup.py install`
for a Docker-bound task violate this skill unless
...
@@ -488,6 +531,12 @@ performance section with 10 warmup iterations and 100 timed iterations, using
...
@@ -488,6 +531,12 @@ performance section with 10 warmup iterations and 100 timed iterations, using
explicit
`torch.cuda.synchronize()`
around timed regions. Report mean time in
explicit
`torch.cuda.synchronize()`
around timed regions. Report mean time in
microseconds and effective bandwidth for the target workload.
microseconds and effective bandwidth for the target workload.
There should be one final task test file under
`test/`
. Put temporary candidate
tests, standalone benchmark scripts, PMC parsers, sweep configs, and exploratory
reference scripts under
`.humanize/lightop-agent/`
, then have the single final
`test/test_<operator_name>.py`
cover the accepted correctness and performance
checks.
The correctness reference must be PyTorch, Triton, an existing LightOp
The correctness reference must be PyTorch, Triton, an existing LightOp
composition, or the exact user-provided oracle. The script may contain its own
composition, or the exact user-provided oracle. The script may contain its own
small benchmark harness; do not treat
`.humanize/lightop-agent/final_test.py`
,
small benchmark harness; do not treat
`.humanize/lightop-agent/final_test.py`
,
...
@@ -600,6 +649,9 @@ schema. Include acceptance criteria for:
...
@@ -600,6 +649,9 @@ schema. Include acceptance criteria for:
target effective-bandwidth threshold.
target effective-bandwidth threshold.
-
Correctness coverage for
`W`
, edge cases, dtype/layout/mode boundaries, and
-
Correctness coverage for
`W`
, edge cases, dtype/layout/mode boundaries, and
baseline/reference parity.
baseline/reference parity.
-
LightOp development-conformance checklist: nearest same-family files used as
style references, wrapper/binding/config/test/benchmark conventions to follow,
allowed deviations, and files that must not be touched.
-
Build command, ROCm/DTK/PyTorch versions,
`PYTORCH_ROCM_ARCH`
, and device
-
Build command, ROCm/DTK/PyTorch versions,
`PYTORCH_ROCM_ARCH`
, and device
metadata.
metadata.
-
Benchmark method with warmup, repeats, synchronization, per-shape timing,
-
Benchmark method with warmup, repeats, synchronization, per-shape timing,
...
@@ -611,7 +663,8 @@ schema. Include acceptance criteria for:
...
@@ -611,7 +663,8 @@ schema. Include acceptance criteria for:
The file must run correctness first, then performance with 10 warmup and 100
The file must run correctness first, then performance with 10 warmup and 100
timed iterations, report average time in us and effective bandwidth, and use
timed iterations, report average time in us and effective bandwidth, and use
PyTorch, Triton, existing LightOp composition, or the user-provided oracle as
PyTorch, Triton, existing LightOp composition, or the user-provided oracle as
baseline/reference.
baseline/reference. Only one final task test file should be added under
`test/`
; auxiliary scripts belong under
`.humanize/lightop-agent/`
.
-
Per-candidate
`hipprof --pmc`
capture after every correctness-passing
-
Per-candidate
`hipprof --pmc`
capture after every correctness-passing
optimization edit, including artifact path, selected card, representative
optimization edit, including artifact path, selected card, representative
shape, cache counters or unavailable-counter reason, LDS/bank-conflict
shape, cache counters or unavailable-counter reason, LDS/bank-conflict
...
@@ -640,7 +693,7 @@ schema. Include acceptance criteria for:
...
@@ -640,7 +693,7 @@ schema. Include acceptance criteria for:
regimes.
regimes.
-
Final correctness matrix, benchmark matrix, fallback paths, unsupported
-
Final correctness matrix, benchmark matrix, fallback paths, unsupported
regimes, final target-hit guard validation from the required
`test/`
file,
regimes, final target-hit guard validation from the required
`test/`
file,
concise final result table, and residual risk.
concise final result table,
LightOp conformance review,
and residual risk.
## RLCR Startup
## RLCR Startup
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment