Commit 73900188 authored by whlwhlwhl's avatar whlwhlwhl
Browse files

完善测试脚本的编写

parent 0fec721c
...@@ -118,7 +118,9 @@ lightop/config*.py ...@@ -118,7 +118,9 @@ lightop/config*.py
- 只有算子需要对用户公开时,才改 `lightop/__init__.py` - 只有算子需要对用户公开时,才改 `lightop/__init__.py`
- 只有新增 `csrc/<family>``setup.py` glob 覆盖不到时,才改 `setup.py` - 只有新增 `csrc/<family>``setup.py` glob 覆盖不到时,才改 `setup.py`
- 需要 shape/gfx 特化时,才改 `lightop/config*.py` 或 dispatch 表。 - 需要 shape/gfx 特化时,才改 `lightop/config*.py` 或 dispatch 表。
- 添加聚焦的正确性测试和目标 workload 的 benchmark。 - 最终测试脚本必须放在 `test/` 下,命名为 `test_<算子名>.py`。不要只把
`final_test.py``bench_baseline.py` 等脚本放在 `.humanize/lightop-agent/` 里;
`.humanize` 只用于过程记录和临时证据,不是 LightOp 最终测试入口。
### 已有算子优化 Checklist ### 已有算子优化 Checklist
...@@ -132,6 +134,8 @@ lightop/config*.py ...@@ -132,6 +134,8 @@ lightop/config*.py
- 不修改无关 operator family。 - 不修改无关 operator family。
- 优先只改目标算子的 kernel、launcher、必要 config、聚焦 test、benchmark。 - 优先只改目标算子的 kernel、launcher、必要 config、聚焦 test、benchmark。
- 直接在指定环境里 install、test、benchmark、profile、tune。 - 直接在指定环境里 install、test、benchmark、profile、tune。
- 已有算子优化任务里,用户只需要指定目标算子和测试文件;如果用户给了测试文件,就以
该文件作为最终验证文件。如果没有给,就推断或创建 `test/test_<算子名>.py`
- 如果某条优化线回退或只对非目标 shape 有帮助,要记录 reject 原因。 - 如果某条优化线回退或只对非目标 shape 有帮助,要记录 reject 原因。
### DCU/ROCm 默认规则 ### DCU/ROCm 默认规则
...@@ -306,8 +310,16 @@ cd test ...@@ -306,8 +310,16 @@ cd test
python test_<op>.py python test_<op>.py
``` ```
如果没有 benchmark,就添加小 benchmark,必须包含 warmup、固定 shape、固定 seed、 最终验证文件必须在 `test/` 下。新增算子必须新增 `test/test_<算子名>.py`;优化已有算子
计时前后显式 `torch.cuda.synchronize()` 时使用用户指定的测试文件,没有指定时再推断或创建 `test/test_<算子名>.py`
测试脚本必须先做精度验证,再做性能测试。精度基准可以是 PyTorch、Triton、已有
LightOp 组合路径,或者用户提示里指定的 oracle。性能测试必须包含 10 轮 warmup 和
100 轮 timed iterations,计时前后显式 `torch.cuda.synchronize()`,最终报告平均时间,
单位是 us,同时包含有效带宽。
最终答复和 `kernel_opt_readme.md` 里要用简短 Markdown 表格呈现最终验证信息,至少包含:
测试文件、reference、shape、dtype、精度结果、平均耗时 us、有效带宽、选卡、实际命令。
每个正确性通过的优化 candidate 都要在 benchmark 后跑 PMC: 每个正确性通过的优化 candidate 都要在 benchmark 后跑 PMC:
......
...@@ -154,7 +154,11 @@ Typical add-operator checklist: ...@@ -154,7 +154,11 @@ Typical add-operator checklist:
explicitly asks for legacy metadata maintenance. explicitly asks for legacy metadata maintenance.
- If performance depends on shape/gfx-specific choices, update or add the - If performance depends on shape/gfx-specific choices, update or add the
relevant config/dispatcher table under `lightop/config*.py`. relevant config/dispatcher table under `lightop/config*.py`.
- Add focused correctness tests under `test/` and benchmark coverage for `W`. - Add the final focused correctness/performance test script under `test/`,
named `test_<operator_name>.py` after the public operator or target kernel
name. Do not leave final validation scripts only under
`.humanize/lightop-agent/`; that directory is for records and temporary
evidence, not the LightOp test surface.
Optimization-only checklist: Optimization-only checklist:
...@@ -173,6 +177,10 @@ Optimization-only checklist: ...@@ -173,6 +177,10 @@ Optimization-only checklist:
target. target.
- Keep edits scoped to the operator family, binding, tests, benchmark, and - Keep edits scoped to the operator family, binding, tests, benchmark, and
tuning table needed for the target task. tuning table needed for the target task.
- For an existing-operator optimization task, the user only needs to specify
the target operator and test file. Use that test file as the required final
validation file. If the user does not specify one, infer or create
`test/test_<operator_name>.py`.
- Record rejected lineages when an optimization regresses or only helps a - Record rejected lineages when an optimization regresses or only helps a
non-target shape. non-target shape.
...@@ -472,10 +480,24 @@ cd test ...@@ -472,10 +480,24 @@ cd test
python test_<op>.py python test_<op>.py
``` ```
Then run the relevant benchmark script and compare it against the named Final validation for a new operator must live in `test/test_<operator_name>.py`.
baseline and threshold. If no benchmark exists, add a small benchmark that uses For an optimization task, use the user-specified test file, or infer/create
warmup, fixed shapes, fixed seeds, and explicit `torch.cuda.synchronize()` `test/test_<operator_name>.py` when none is given. The final test file must
around timed regions. first run numerical correctness against the chosen reference, then run a
performance section with 10 warmup iterations and 100 timed iterations, using
explicit `torch.cuda.synchronize()` around timed regions. Report mean time in
microseconds and effective bandwidth for the target workload.
The correctness reference must be PyTorch, Triton, an existing LightOp
composition, or the exact user-provided oracle. The script may contain its own
small benchmark harness; do not treat `.humanize/lightop-agent/final_test.py`,
`.humanize/lightop-agent/bench_baseline.py`, or other record-only scripts as
the final LightOp validation surface.
The final answer and `kernel_opt_readme.md` must show the final validation in a
concise Markdown table with at least: test file, reference, shape, dtype,
correctness result, mean time in us, effective bandwidth, selected card, and
command.
Before every benchmark, run the performance device gate from this skill: Before every benchmark, run the performance device gate from this skill:
capture `hy-smi` or `rocm-smi`, choose a low-utilization/low-VRAM card, and run capture `hy-smi` or `rocm-smi`, choose a low-utilization/low-VRAM card, and run
...@@ -583,6 +605,13 @@ schema. Include acceptance criteria for: ...@@ -583,6 +605,13 @@ schema. Include acceptance criteria for:
- Benchmark method with warmup, repeats, synchronization, per-shape timing, - Benchmark method with warmup, repeats, synchronization, per-shape timing,
p50/p90 or mean as appropriate, variance/noise band, minimum effective delta, p50/p90 or mean as appropriate, variance/noise band, minimum effective delta,
and environment metadata. and environment metadata.
- Required final test file policy: new operators must add
`test/test_<operator_name>.py`; optimization tasks must use the
user-specified test file or infer/create `test/test_<operator_name>.py`.
The file must run correctness first, then performance with 10 warmup and 100
timed iterations, report average time in us and effective bandwidth, and use
PyTorch, Triton, existing LightOp composition, or the user-provided oracle as
baseline/reference.
- Per-candidate `hipprof --pmc` capture after every correctness-passing - Per-candidate `hipprof --pmc` capture after every correctness-passing
optimization edit, including artifact path, selected card, representative optimization edit, including artifact path, selected card, representative
shape, cache counters or unavailable-counter reason, LDS/bank-conflict shape, cache counters or unavailable-counter reason, LDS/bank-conflict
...@@ -610,7 +639,8 @@ schema. Include acceptance criteria for: ...@@ -610,7 +639,8 @@ schema. Include acceptance criteria for:
- Tuning decisions and dispatcher/config updates when `W` has multiple - Tuning decisions and dispatcher/config updates when `W` has multiple
regimes. regimes.
- Final correctness matrix, benchmark matrix, fallback paths, unsupported - Final correctness matrix, benchmark matrix, fallback paths, unsupported
regimes, final target-hit guard validation, and residual risk. regimes, final target-hit guard validation from the required `test/` file,
concise final result table, and residual risk.
## RLCR Startup ## RLCR Startup
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment