Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
whlwhlwhl
Lightop-SKIILS
Commits
067b04c0
Commit
067b04c0
authored
May 21, 2026
by
whlwhlwhl
Browse files
添加融合算子limit
parent
4b893124
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
44 additions
and
3 deletions
+44
-3
humanize/skills/humanize-kernel-agent-loop/SKILL.md
humanize/skills/humanize-kernel-agent-loop/SKILL.md
+44
-3
No files found.
humanize/skills/humanize-kernel-agent-loop/SKILL.md
View file @
067b04c0
...
@@ -105,6 +105,15 @@ that split and both paths point at the same compiled extension.
...
@@ -105,6 +105,15 @@ that split and both paths point at the same compiled extension.
For a new or modified operator, inspect the nearest existing operator family
For a new or modified operator, inspect the nearest existing operator family
first and follow its style.
first and follow its style.
For a new fused operator, first recover the fusion ingredients from the new
operator name, requested semantics, or provided implementation sketch. Search
LightOp for the pre-fusion single operators and related fused implementations
before designing a new kernel. Use those local implementations as the primary
baseline for API shape, tensor validation, dispatch/config style, correctness
reference, benchmark comparison, and performance expectations. If LightOp has
no matching local baseline, record the search terms and absence, then fall back
to a PyTorch or literal oracle reference.
Core source locations:
Core source locations:
```
text
```
text
...
@@ -120,6 +129,12 @@ lightop/config*.py
...
@@ -120,6 +129,12 @@ lightop/config*.py
Typical add-operator checklist:
Typical add-operator checklist:
-
For fused operators, inspect each component single-op wrapper, binding,
kernel, test, benchmark, and config path, plus any neighboring fused kernels
with similar data movement or epilogue structure.
-
Build the first correctness and benchmark baseline from the unfused LightOp
composition when those component operators exist; otherwise use the nearest
LightOp implementation plus a PyTorch reference.
-
Add or modify HIP/C++ implementation under the closest
`lightop/csrc/`
-
Add or modify HIP/C++ implementation under the closest
`lightop/csrc/`
family. Create a new family only when no existing family fits.
family. Create a new family only when no existing family fits.
-
Expose the C++ symbol in
`lightop/csrc/export.cpp`
with
`m.def(...)`
.
-
Expose the C++ symbol in
`lightop/csrc/export.cpp`
with
`m.def(...)`
.
...
@@ -218,13 +233,18 @@ evidence.
...
@@ -218,13 +233,18 @@ evidence.
threshold.
threshold.
3.
Inspect the existing wrapper, binding, kernel, config table, tests, and
3.
Inspect the existing wrapper, binding, kernel, config table, tests, and
benchmarks.
benchmarks.
4.
Before the first optimization edit, it is recommended to query
4.
For a new fused operator, search by the requested operator name, name
tokens, component op names, and semantics to find LightOp's pre-fusion
single operators and related fused kernels. Record the chosen baseline:
unfused LightOp composition, nearest fused LightOp implementation, PyTorch
reference, or explicit "no local baseline found".
5.
Before the first optimization edit, it is recommended to query
`lightop-kernel-knowledge`
for local LightOp patterns, ROCm/DCU upstream
`lightop-kernel-knowledge`
for local LightOp patterns, ROCm/DCU upstream
evidence, Hygon/DCU source references, and portable ideas from the bundled
evidence, Hygon/DCU source references, and portable ideas from the bundled
corpus. Use this whenever it can shape the first implementation route.
corpus. Use this whenever it can shape the first implementation route.
5
.
Write a concise research digest in the loop state before the first serious
6
.
Write a concise research digest in the loop state before the first serious
implementation lineage.
implementation lineage.
6
.
Define the benchmark contract before editing code: exact target shape(s),
7
.
Define the benchmark contract before editing code: exact target shape(s),
dtype/layout/contiguity, axis/mode/epsilon, effective-bandwidth formula,
dtype/layout/contiguity, axis/mode/epsilon, effective-bandwidth formula,
warmup/repeat counts, selected summary statistic, noise band, and the
warmup/repeat counts, selected summary statistic, noise band, and the
benchmark command that will be used for baseline and candidates.
benchmark command that will be used for baseline and candidates.
...
@@ -327,6 +347,7 @@ Keep Humanize state local and untracked:
...
@@ -327,6 +347,7 @@ Keep Humanize state local and untracked:
.humanize/lightop-agent/research-digest.md
.humanize/lightop-agent/research-digest.md
.humanize/lightop-agent/attempt-ledger.md
.humanize/lightop-agent/attempt-ledger.md
.humanize/lightop-agent/kernel_opt_readme.md
.humanize/lightop-agent/kernel_opt_readme.md
.humanize/lightop-agent/rlcr-fallback.md
.humanize/lightop-agent/optimization-ledger.md
.humanize/lightop-agent/optimization-ledger.md
.humanize/lightop-agent/lineage.jsonl
.humanize/lightop-agent/lineage.jsonl
.humanize/lightop-agent/performance-map.json
.humanize/lightop-agent/performance-map.json
...
@@ -456,6 +477,9 @@ Write `.humanize/lightop-agent/refined-plan.md` using the Humanize gen-plan
...
@@ -456,6 +477,9 @@ Write `.humanize/lightop-agent/refined-plan.md` using the Humanize gen-plan
schema. Include acceptance criteria for:
schema. Include acceptance criteria for:
-
LightOp root, target operator family, public API, and modified files.
-
LightOp root, target operator family, public API, and modified files.
-
For new fused operators, the LightOp pre-fusion single-op search result,
related fused implementation search result, and chosen baseline/reference
path.
-
Explicit
`K`
,
`R`
,
`W`
, target gfx arch, baseline command, comparison target,
-
Explicit
`K`
,
`R`
,
`W`
, target gfx arch, baseline command, comparison target,
and hard scope exclusions.
and hard scope exclusions.
-
Workload contract: target shape(s), dtype, layout/contiguity, axis/mode,
-
Workload contract: target shape(s), dtype, layout/contiguity, axis/mode,
...
@@ -508,6 +532,23 @@ the loop from the LightOp root:
...
@@ -508,6 +532,23 @@ the loop from the LightOp root:
```
```
If setup exits non-zero, stop and report the error. Do not bypass the gate.
If setup exits non-zero, stop and report the error. Do not bypass the gate.
Exception: if setup fails only because the
`codex`
CLI is unavailable, manual
fallback mode is allowed. Before continuing, write:
```
text
.humanize/lightop-agent/rlcr-fallback.md
.humanize/lightop-agent/refined-plan.md
.humanize/lightop-agent/research-digest.md
.humanize/lightop-agent/attempt-ledger.md
.humanize/lightop-agent/kernel_opt_readme.md
```
`rlcr-fallback.md`
must state that Codex review gate is unavailable, include
the exact setup command and error output, name the missing dependency, and
declare that all build/test/benchmark/profile, device-selection, evidence,
performance-target, low-gain, and logging constraints from this skill still
apply. In fallback mode, proceed manually with the same optimization loop, but
do not claim that Humanize/Codex review was active.
After setup succeeds:
After setup succeeds:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment