添加开发规范

6889486d · whlwhlwhl · 73900188 · 6889486d · 6889486d
Commit 6889486d authored May 22, 2026 by whlwhlwhl
Show whitespace changes
Inline Side-by-side

Showing with 84 additions and 11 deletions

docs/lightop-skills.zh-CN.md docs/lightop-skills.zh-CN.md +22 -2

humanize/skills/humanize-kernel-agent-loop/SKILL.md humanize/skills/humanize-kernel-agent-loop/SKILL.md +62 -9

No files found.
--- a/docs/lightop-skills.zh-CN.md
+++ b/docs/lightop-skills.zh-CN.md
@@ -104,6 +104,17 @@ test/<family>/*benchmark*.py
 lightop/config*.py
 ```
+LightOp 算子源码使用 `.cu`。不要手写或提交 `.hip` 算子源码文件；如果编译过程自动生成
+`.hip`，它只是 build artifact，不是本次应该落库的源码。
+所有改动都要符合 LightOp 现有开发规范。agent 写代码前必须先找最近的同 family
+实现作为参照，沿用它的目录、文件命名、C++ namespace/include/launch helper、
+wrapper 参数校验、`export.cpp` binding、config/dispatcher、test 和 benchmark 风格。
+不要引入无关依赖、外部项目目录结构、批量格式化、生成源码、无关 operator family
+改动，除非用户明确要求并在计划里说明原因。交付前需要做一次规范自检：列出修改文件
+对应参考了哪些 LightOp 本地文件，说明任何有意偏离，并确认没有手写 `.hip` 源码、
+没有无关改动、`test/` 下没有多个最终任务测试入口。
 新增 fused 算子时，agent 应先搜索拆分前的单算子和相关 fused 实现，用本地 LightOp
 路径作为 API、校验、benchmark 和性能预期的主要基准。找不到本地基准时，再记录搜索
 结果为空，并回退到 PyTorch 或字面 oracle。
@@ -120,7 +131,9 @@ lightop/config*.py
 - 需要 shape/gfx 特化时，才改 `lightop/config*.py` 或 dispatch 表。
 - 最终测试脚本必须放在 `test/` 下，命名为 `test_<算子名>.py`。不要只把
  `final_test.py`、`bench_baseline.py` 等脚本放在 `.humanize/lightop-agent/` 里；
-  `.humanize` 只用于过程记录和临时证据，不是 LightOp 最终测试入口。
+  `.humanize` 只用于过程记录和临时证据，不是 LightOp 最终测试入口。每个任务在
+  `test/` 下只新增或使用这一个正式单测入口，候选测试、benchmark harness、sweep 配置、
+  PMC 解析脚本等辅助材料都放 `.humanize/lightop-agent/`。
 ### 已有算子优化 Checklist
@@ -136,12 +149,15 @@ lightop/config*.py
 - 直接在指定环境里 install、test、benchmark、profile、tune。
 - 已有算子优化任务里，用户只需要指定目标算子和测试文件；如果用户给了测试文件，就以
  该文件作为最终验证文件。如果没有给，就推断或创建 `test/test_<算子名>.py`。
+  不要在 `test/` 下额外堆多个任务脚本；其它辅助脚本放 `.humanize/lightop-agent/`。
 - 如果某条优化线回退或只对非目标 shape 有帮助，要记录 reject 原因。
 ### DCU/ROCm 默认规则
 - 在 LightOp 测试里，把 PyTorch 的 `torch.cuda` namespace 视作 ROCm runtime facade。
 - 优先用 `hipcc` 和 ROCm extension build。
+- LightOp kernel 源码落 `.cu`，不要生成或提交手写 `.hip` 算子文件；`.hip` 如果出现，
+  视作编译自动生成产物。
 - 尊重 `PYTORCH_ROCM_ARCH`；没设置时从 `gcnArchName` 推断。
 - 不引入 CUDA-only header、PTX/SASS、CUTLASS/CuTe、Nsight Compute、TMA/WGMMA 等
  NVIDIA 专用假设。
@@ -283,7 +299,9 @@ docker exec <container> bash -lc 'cd <container-lightop> && PYTORCH_ROCM_ARCH="g
 ```
 无论 PyTorch 版本是什么，都不切到 `setup_torch29.py`。正常调优循环里不删除
-`build/`，这样能复用增量编译结果。只有用户明确要求 clean build，或者证明 build
+`build/`，也不删除、移动、重建 `build/bdist.*`、`build/lib.*`、`build/temp.*`
+这些 `python setup.py install` 正常生成的子目录。它们就是增量编译缓存和安装产物，
+要保留复用，避免每轮重新全量编译。只有用户明确要求 clean build，或者证明 build
 cache 损坏时，才清理。
 也就是说，普通 LightOp 调优任务里不要执行 `rm -rf build`，不要执行
@@ -312,6 +330,8 @@ python test_<op>.py
 最终验证文件必须在 `test/` 下。新增算子必须新增 `test/test_<算子名>.py`；优化已有算子
 时使用用户指定的测试文件，没有指定时再推断或创建 `test/test_<算子名>.py`。
+`test/` 下只保留这一个正式任务测试入口；临时 candidate 测试、独立 benchmark、PMC
+解析、shape sweep、baseline 探索脚本都放 `.humanize/lightop-agent/`。
 测试脚本必须先做精度验证，再做性能测试。精度基准可以是 PyTorch、Triton、已有
 LightOp 组合路径，或者用户提示里指定的 oracle。性能测试必须包含 10 轮 warmup 和

--- a/humanize/skills/humanize-kernel-agent-loop/SKILL.md
+++ b/humanize/skills/humanize-kernel-agent-loop/SKILL.md
@@ -134,6 +134,37 @@ test/<family>/*benchmark*.py
 lightop/config*.py
 ```
+Use `.cu` for LightOp HIP/C++ kernel source files. Do not create or edit
+hand-written `.hip` source files for LightOp operators; `.hip` files are treated
+as build-generated artifacts when the local build produces them, not as the
+source format to land in the repository.
+LightOp development-conformance gate:
+- Before editing, inspect the nearest same-family LightOp implementation and
+  mirror its directory layout, file naming, C++ namespace style, include order,
+  launch helper style, macro/dispatch pattern, error-checking idioms, and
+  formatting.
+- Match the existing Python wrapper style for imports, tensor/device/dtype
+  checks, contiguity handling, optional arguments, return shape, and fallback or
+  error behavior. Do not invent a new user-facing wrapper pattern when a local
+  one exists.
+- Match the existing `lightop/csrc/export.cpp` binding style, including symbol
+  names, argument order, `m.def(...)` placement, and public API exposure policy.
+- Match existing `test/` style for seeds, tolerances, dtype/shape coverage,
+  synchronization, skip logic, and command-line behavior. Do not introduce a new
+  test framework or many one-off test entrypoints for a single task.
+- Match existing benchmark/config conventions before adding new tables,
+  dispatcher branches, shape keys, environment variables, or benchmark CLI
+  flags.
+- Do not add unrelated dependencies, vendored kernels, generated source files,
+  broad refactors, formatting churn, or foreign project layout unless the user
+  explicitly asks and the reason is recorded in the plan.
+- Before final reporting, run a conformance review: list changed files, the
+  nearest LightOp files they follow, any intentional deviations, and confirm
+  there are no hand-written `.hip` sources, unrelated operator-family edits, or
+  extra final tests under `test/`.
 Typical add-operator checklist:
 - For fused operators, inspect each component single-op wrapper, binding,
@@ -156,9 +187,12 @@ Typical add-operator checklist:
  relevant config/dispatcher table under `lightop/config*.py`.
 - Add the final focused correctness/performance test script under `test/`,
  named `test_<operator_name>.py` after the public operator or target kernel
-  name. Do not leave final validation scripts only under
+  name. Create only this one final task test under `test/`; put auxiliary
-  `.humanize/lightop-agent/`; that directory is for records and temporary
+  benchmark harnesses, candidate tests, sweep configs, PMC parsers, temporary
-  evidence, not the LightOp test surface.
+  reference scripts, and other helper scripts under `.humanize/lightop-agent/`.
+  Do not leave final validation scripts only under `.humanize/lightop-agent/`;
+  that directory is for records and temporary evidence, not the LightOp test
+  surface.
 Optimization-only checklist:
@@ -180,7 +214,9 @@ Optimization-only checklist:
 - For an existing-operator optimization task, the user only needs to specify
  the target operator and test file. Use that test file as the required final
  validation file. If the user does not specify one, infer or create
-  `test/test_<operator_name>.py`.
+  `test/test_<operator_name>.py`. Do not add extra task-specific test files
+  under `test/`; place helper benchmarks, parse scripts, and sweep scripts
+  under `.humanize/lightop-agent/`.
 - Record rejected lineages when an optimization regresses or only helps a
  non-target shape.
@@ -189,6 +225,9 @@ Optimization-only checklist:
 - Treat PyTorch's `torch.cuda` namespace as the ROCm runtime facade when used
  by LightOp tests.
 - Prefer `hipcc`/ROCm extension builds over NVIDIA-only compile paths.
+- Use `.cu` as the checked-in LightOp kernel source extension. Do not create
+  checked-in `.hip` operator files; if the build generates `.hip` files, leave
+  them as build artifacts and do not treat them as source edits.
 - Respect `PYTORCH_ROCM_ARCH`; if unset, derive gfx from
  `torch.cuda.get_device_properties(0).gcnArchName`.
 - Prefer `gfx928;gfx936;gfx938` when the local LightOp setup already uses that
@@ -452,9 +491,13 @@ docker exec <container> bash -lc 'cd <container-lightop> && PYTORCH_ROCM_ARCH="g
 ```
 Keep the existing `build/` directory between attempts so incremental extension
-builds can reuse prior compilation output. Do not delete `build/` as part of
+builds can reuse prior compilation output. `python setup.py install` may create
-the normal build/test/tune loop unless the user explicitly requests a clean
+or update normal setuptools subdirectories such as `build/bdist.*`,
-build or the build cache is proven to be stale or corrupt.
+`build/lib.*`, and `build/temp.*`; leave them in place and reuse them on the
+next build. Do not delete, rename, move, or recreate `build/` or its
+subdirectories as part of the normal build/test/tune loop unless the user
+explicitly requests a clean build or the build cache is proven to be stale or
+corrupt.
 Commands such as `rm -rf build`, `python setup_torch29.py install`, or a host
 side `python setup.py install` for a Docker-bound task violate this skill unless
@@ -488,6 +531,12 @@ performance section with 10 warmup iterations and 100 timed iterations, using
 explicit `torch.cuda.synchronize()` around timed regions. Report mean time in
 microseconds and effective bandwidth for the target workload.
+There should be one final task test file under `test/`. Put temporary candidate
+tests, standalone benchmark scripts, PMC parsers, sweep configs, and exploratory
+reference scripts under `.humanize/lightop-agent/`, then have the single final
+`test/test_<operator_name>.py` cover the accepted correctness and performance
+checks.
 The correctness reference must be PyTorch, Triton, an existing LightOp
 composition, or the exact user-provided oracle. The script may contain its own
 small benchmark harness; do not treat `.humanize/lightop-agent/final_test.py`,
@@ -600,6 +649,9 @@ schema. Include acceptance criteria for:
  target effective-bandwidth threshold.
 - Correctness coverage for `W`, edge cases, dtype/layout/mode boundaries, and
  baseline/reference parity.
+- LightOp development-conformance checklist: nearest same-family files used as
+  style references, wrapper/binding/config/test/benchmark conventions to follow,
+  allowed deviations, and files that must not be touched.
 - Build command, ROCm/DTK/PyTorch versions, `PYTORCH_ROCM_ARCH`, and device
  metadata.
 - Benchmark method with warmup, repeats, synchronization, per-shape timing,
@@ -611,7 +663,8 @@ schema. Include acceptance criteria for:
  The file must run correctness first, then performance with 10 warmup and 100
  timed iterations, report average time in us and effective bandwidth, and use
  PyTorch, Triton, existing LightOp composition, or the user-provided oracle as
-  baseline/reference.
+  baseline/reference. Only one final task test file should be added under
+  `test/`; auxiliary scripts belong under `.humanize/lightop-agent/`.
 - Per-candidate `hipprof --pmc` capture after every correctness-passing
  optimization edit, including artifact path, selected card, representative
  shape, cache counters or unavailable-counter reason, LDS/bank-conflict
@@ -640,7 +693,7 @@ schema. Include acceptance criteria for:
  regimes.
 - Final correctness matrix, benchmark matrix, fallback paths, unsupported
  regimes, final target-hit guard validation from the required `test/` file,
-  concise final result table, and residual risk.
+  concise final result table, LightOp conformance review, and residual risk.
 ## RLCR Startup