Commit 067b04c0 authored by whlwhlwhl's avatar whlwhlwhl
Browse files

添加融合算子limit

parent 4b893124
......@@ -105,6 +105,15 @@ that split and both paths point at the same compiled extension.
For a new or modified operator, inspect the nearest existing operator family
first and follow its style.
For a new fused operator, first recover the fusion ingredients from the new
operator name, requested semantics, or provided implementation sketch. Search
LightOp for the pre-fusion single operators and related fused implementations
before designing a new kernel. Use those local implementations as the primary
baseline for API shape, tensor validation, dispatch/config style, correctness
reference, benchmark comparison, and performance expectations. If LightOp has
no matching local baseline, record the search terms and absence, then fall back
to a PyTorch or literal oracle reference.
Core source locations:
```text
......@@ -120,6 +129,12 @@ lightop/config*.py
Typical add-operator checklist:
- For fused operators, inspect each component single-op wrapper, binding,
kernel, test, benchmark, and config path, plus any neighboring fused kernels
with similar data movement or epilogue structure.
- Build the first correctness and benchmark baseline from the unfused LightOp
composition when those component operators exist; otherwise use the nearest
LightOp implementation plus a PyTorch reference.
- Add or modify HIP/C++ implementation under the closest `lightop/csrc/`
family. Create a new family only when no existing family fits.
- Expose the C++ symbol in `lightop/csrc/export.cpp` with `m.def(...)`.
......@@ -218,13 +233,18 @@ evidence.
threshold.
3. Inspect the existing wrapper, binding, kernel, config table, tests, and
benchmarks.
4. Before the first optimization edit, it is recommended to query
4. For a new fused operator, search by the requested operator name, name
tokens, component op names, and semantics to find LightOp's pre-fusion
single operators and related fused kernels. Record the chosen baseline:
unfused LightOp composition, nearest fused LightOp implementation, PyTorch
reference, or explicit "no local baseline found".
5. Before the first optimization edit, it is recommended to query
`lightop-kernel-knowledge` for local LightOp patterns, ROCm/DCU upstream
evidence, Hygon/DCU source references, and portable ideas from the bundled
corpus. Use this whenever it can shape the first implementation route.
5. Write a concise research digest in the loop state before the first serious
6. Write a concise research digest in the loop state before the first serious
implementation lineage.
6. Define the benchmark contract before editing code: exact target shape(s),
7. Define the benchmark contract before editing code: exact target shape(s),
dtype/layout/contiguity, axis/mode/epsilon, effective-bandwidth formula,
warmup/repeat counts, selected summary statistic, noise band, and the
benchmark command that will be used for baseline and candidates.
......@@ -327,6 +347,7 @@ Keep Humanize state local and untracked:
.humanize/lightop-agent/research-digest.md
.humanize/lightop-agent/attempt-ledger.md
.humanize/lightop-agent/kernel_opt_readme.md
.humanize/lightop-agent/rlcr-fallback.md
.humanize/lightop-agent/optimization-ledger.md
.humanize/lightop-agent/lineage.jsonl
.humanize/lightop-agent/performance-map.json
......@@ -456,6 +477,9 @@ Write `.humanize/lightop-agent/refined-plan.md` using the Humanize gen-plan
schema. Include acceptance criteria for:
- LightOp root, target operator family, public API, and modified files.
- For new fused operators, the LightOp pre-fusion single-op search result,
related fused implementation search result, and chosen baseline/reference
path.
- Explicit `K`, `R`, `W`, target gfx arch, baseline command, comparison target,
and hard scope exclusions.
- Workload contract: target shape(s), dtype, layout/contiguity, axis/mode,
......@@ -508,6 +532,23 @@ the loop from the LightOp root:
```
If setup exits non-zero, stop and report the error. Do not bypass the gate.
Exception: if setup fails only because the `codex` CLI is unavailable, manual
fallback mode is allowed. Before continuing, write:
```text
.humanize/lightop-agent/rlcr-fallback.md
.humanize/lightop-agent/refined-plan.md
.humanize/lightop-agent/research-digest.md
.humanize/lightop-agent/attempt-ledger.md
.humanize/lightop-agent/kernel_opt_readme.md
```
`rlcr-fallback.md` must state that Codex review gate is unavailable, include
the exact setup command and error output, name the missing dependency, and
declare that all build/test/benchmark/profile, device-selection, evidence,
performance-target, low-gain, and logging constraints from this skill still
apply. In fallback mode, proceed manually with the same optimization loop, but
do not claim that Humanize/Codex review was active.
After setup succeeds:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment