添加性能要求

ce50b2f9 · whlwhlwhl · c842f8f1 · ce50b2f9
Commit ce50b2f9 authored May 19, 2026 by whlwhlwhl
Hide whitespace changes
Inline Side-by-side

Showing with 42 additions and 0 deletions

humanize/skills/humanize-kernel-agent-loop/SKILL.md humanize/skills/humanize-kernel-agent-loop/SKILL.md +42 -0

No files found.
--- a/humanize/skills/humanize-kernel-agent-loop/SKILL.md
+++ b/humanize/skills/humanize-kernel-agent-loop/SKILL.md
@@ -189,6 +189,9 @@ hipcc --version
   regression, plateau, and improvement.
 5. Invoke `dcu-profiler-report` when benchmark evidence is not enough to choose
   the next edit.
+6. If the first correctness-passing candidate misses the required performance
+   threshold, continue into profiling and tuning instead of declaring the task
+   done.

 ### Stage 3: Tune And Integrate

@@ -200,6 +203,36 @@ hipcc --version
 5. Summarize final code paths, fallback behavior, unsupported regimes, and
   remaining risks.

+## Performance Target Discipline
+
+When the user gives a bandwidth, latency, throughput, or speedup target, treat
+that threshold as part of the acceptance contract.
+
+- Do not claim completion while the best correctness-passing candidate misses
+  the target. Report it only as the current best result with bottleneck
+  evidence and next steps.
+- For every correctness-passing candidate, record shape, dtype, layout/mode,
+  kernel or dispatch configuration, measured bandwidth/latency, comparison
+  baseline, and the reason it improved, regressed, or plateaued.
+- After the first correctness-passing candidate misses the target, run a
+  profiling-and-tuning loop before stopping.
+- Try at least three evidence-backed performance optimization lineages unless
+  profiler evidence shows the target is not reachable under the current `K/R/W`
+  and environment constraints.
+- If two consecutive correctness-passing candidates miss the target, the next
+  kernel or dispatch edit must be preceded by both:
+  - A `lightop-kernel-knowledge` research pass covering local LightOp
+    layernorm/rmsnorm/fused-norm patterns, relevant ROCm/DCU upstream evidence,
+    and any portable reduction/vectorization ideas from the bundled corpus.
+  - A `dcu-profiler-report` digest for a representative target shape.
+- The next edit after that gate must name exactly one concrete LightOp kernel,
+  binding, dispatcher, config, or benchmark change and cite the knowledge and
+  profiler evidence that motivated it.
+- If the target remains unmet after the required tuning lineages, summarize the
+  best candidate, failed lineages, profiler bottleneck class, unsupported
+  regimes, and the most likely next engineering investment. Do not present the
+  operator as performance-complete.
+
 ## Required Loop State

 Keep Humanize state local and untracked:
@@ -273,6 +306,11 @@ next source of truth. These are heuristics, not user-facing gates:
 - A correct candidate is within +/-2% of baseline or the prior best.
 - A correct candidate regresses on an important shape.
 - The benchmark plateaued and the next edit is unclear.
+- The first correctness-passing candidate misses the user's required
+  performance target.
+- Two consecutive correctness-passing candidates miss the target, in which case
+  pair this profile with a `lightop-kernel-knowledge` research pass before the
+  next kernel or dispatch edit.
 - A candidate is much faster than expected and needs explanation.
 - A reviewer asks for profiling evidence.

@@ -302,6 +340,10 @@ schema. Include acceptance criteria for:
  evidence that materially changes the route.
 - Attempt ledger for every candidate.
 - Optimization ledger only for correct candidates with measured improvement.
+- Performance-target discipline: the first miss triggers tuning, two
+  consecutive correctness-passing misses trigger both `lightop-kernel-knowledge`
+  research and `dcu-profiler-report` evidence before the next edit, and unmet
+  targets cannot be reported as complete.
 - Tuning decisions and dispatcher/config updates when `W` has multiple
  regimes.
 - Final correctness matrix, benchmark matrix, fallback paths, unsupported