Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
whlwhlwhl
Lightop-SKIILS
Commits
ce50b2f9
Commit
ce50b2f9
authored
May 19, 2026
by
whlwhlwhl
Browse files
添加性能要求
parent
c842f8f1
Pipeline
#3634
failed with stages
in 0 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
42 additions
and
0 deletions
+42
-0
humanize/skills/humanize-kernel-agent-loop/SKILL.md
humanize/skills/humanize-kernel-agent-loop/SKILL.md
+42
-0
No files found.
humanize/skills/humanize-kernel-agent-loop/SKILL.md
View file @
ce50b2f9
...
...
@@ -189,6 +189,9 @@ hipcc --version
regression, plateau, and improvement.
5.
Invoke
`dcu-profiler-report`
when benchmark evidence is not enough to choose
the next edit.
6.
If the first correctness-passing candidate misses the required performance
threshold, continue into profiling and tuning instead of declaring the task
done.
### Stage 3: Tune And Integrate
...
...
@@ -200,6 +203,36 @@ hipcc --version
5.
Summarize final code paths, fallback behavior, unsupported regimes, and
remaining risks.
## Performance Target Discipline
When the user gives a bandwidth, latency, throughput, or speedup target, treat
that threshold as part of the acceptance contract.
-
Do not claim completion while the best correctness-passing candidate misses
the target. Report it only as the current best result with bottleneck
evidence and next steps.
-
For every correctness-passing candidate, record shape, dtype, layout/mode,
kernel or dispatch configuration, measured bandwidth/latency, comparison
baseline, and the reason it improved, regressed, or plateaued.
-
After the first correctness-passing candidate misses the target, run a
profiling-and-tuning loop before stopping.
-
Try at least three evidence-backed performance optimization lineages unless
profiler evidence shows the target is not reachable under the current
`K/R/W`
and environment constraints.
-
If two consecutive correctness-passing candidates miss the target, the next
kernel or dispatch edit must be preceded by both:
-
A
`lightop-kernel-knowledge`
research pass covering local LightOp
layernorm/rmsnorm/fused-norm patterns, relevant ROCm/DCU upstream evidence,
and any portable reduction/vectorization ideas from the bundled corpus.
-
A
`dcu-profiler-report`
digest for a representative target shape.
-
The next edit after that gate must name exactly one concrete LightOp kernel,
binding, dispatcher, config, or benchmark change and cite the knowledge and
profiler evidence that motivated it.
-
If the target remains unmet after the required tuning lineages, summarize the
best candidate, failed lineages, profiler bottleneck class, unsupported
regimes, and the most likely next engineering investment. Do not present the
operator as performance-complete.
## Required Loop State
Keep Humanize state local and untracked:
...
...
@@ -273,6 +306,11 @@ next source of truth. These are heuristics, not user-facing gates:
-
A correct candidate is within +/-2% of baseline or the prior best.
-
A correct candidate regresses on an important shape.
-
The benchmark plateaued and the next edit is unclear.
-
The first correctness-passing candidate misses the user's required
performance target.
-
Two consecutive correctness-passing candidates miss the target, in which case
pair this profile with a
`lightop-kernel-knowledge`
research pass before the
next kernel or dispatch edit.
-
A candidate is much faster than expected and needs explanation.
-
A reviewer asks for profiling evidence.
...
...
@@ -302,6 +340,10 @@ schema. Include acceptance criteria for:
evidence that materially changes the route.
-
Attempt ledger for every candidate.
-
Optimization ledger only for correct candidates with measured improvement.
-
Performance-target discipline: the first miss triggers tuning, two
consecutive correctness-passing misses trigger both
`lightop-kernel-knowledge`
research and
`dcu-profiler-report`
evidence before the next edit, and unmet
targets cannot be reported as complete.
-
Tuning decisions and dispatcher/config updates when
`W`
has multiple
regimes.
-
Final correctness matrix, benchmark matrix, fallback paths, unsupported
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment