Commit ce50b2f9 authored by whlwhlwhl's avatar whlwhlwhl
Browse files

添加性能要求

parent c842f8f1
Pipeline #3634 failed with stages
in 0 seconds
......@@ -189,6 +189,9 @@ hipcc --version
regression, plateau, and improvement.
5. Invoke `dcu-profiler-report` when benchmark evidence is not enough to choose
the next edit.
6. If the first correctness-passing candidate misses the required performance
threshold, continue into profiling and tuning instead of declaring the task
done.
### Stage 3: Tune And Integrate
......@@ -200,6 +203,36 @@ hipcc --version
5. Summarize final code paths, fallback behavior, unsupported regimes, and
remaining risks.
## Performance Target Discipline
When the user gives a bandwidth, latency, throughput, or speedup target, treat
that threshold as part of the acceptance contract.
- Do not claim completion while the best correctness-passing candidate misses
the target. Report it only as the current best result with bottleneck
evidence and next steps.
- For every correctness-passing candidate, record shape, dtype, layout/mode,
kernel or dispatch configuration, measured bandwidth/latency, comparison
baseline, and the reason it improved, regressed, or plateaued.
- After the first correctness-passing candidate misses the target, run a
profiling-and-tuning loop before stopping.
- Try at least three evidence-backed performance optimization lineages unless
profiler evidence shows the target is not reachable under the current `K/R/W`
and environment constraints.
- If two consecutive correctness-passing candidates miss the target, the next
kernel or dispatch edit must be preceded by both:
- A `lightop-kernel-knowledge` research pass covering local LightOp
layernorm/rmsnorm/fused-norm patterns, relevant ROCm/DCU upstream evidence,
and any portable reduction/vectorization ideas from the bundled corpus.
- A `dcu-profiler-report` digest for a representative target shape.
- The next edit after that gate must name exactly one concrete LightOp kernel,
binding, dispatcher, config, or benchmark change and cite the knowledge and
profiler evidence that motivated it.
- If the target remains unmet after the required tuning lineages, summarize the
best candidate, failed lineages, profiler bottleneck class, unsupported
regimes, and the most likely next engineering investment. Do not present the
operator as performance-complete.
## Required Loop State
Keep Humanize state local and untracked:
......@@ -273,6 +306,11 @@ next source of truth. These are heuristics, not user-facing gates:
- A correct candidate is within +/-2% of baseline or the prior best.
- A correct candidate regresses on an important shape.
- The benchmark plateaued and the next edit is unclear.
- The first correctness-passing candidate misses the user's required
performance target.
- Two consecutive correctness-passing candidates miss the target, in which case
pair this profile with a `lightop-kernel-knowledge` research pass before the
next kernel or dispatch edit.
- A candidate is much faster than expected and needs explanation.
- A reviewer asks for profiling evidence.
......@@ -302,6 +340,10 @@ schema. Include acceptance criteria for:
evidence that materially changes the route.
- Attempt ledger for every candidate.
- Optimization ledger only for correct candidates with measured improvement.
- Performance-target discipline: the first miss triggers tuning, two
consecutive correctness-passing misses trigger both `lightop-kernel-knowledge`
research and `dcu-profiler-report` evidence before the next edit, and unmet
targets cannot be reported as complete.
- Tuning decisions and dispatcher/config updates when `W` has multiple
regimes.
- Final correctness matrix, benchmark matrix, fallback paths, unsupported
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment