description:Start a debugging session with worklog file
user-invocable:true
disable-model-invocation:true
---
# Start Debug Session
Create a structured debugging session for an issue in the Dynamo ecosystem.
## Step 1: Get the Bug Report
Ask the user how they want to provide the bug:
**Option A: Linear ticket**
- User provides ticket ID (e.g., "DYN-123")
- Fetch via Linear MCP tools
- Extract: title, description, reproduction steps
**Option B: GitHub issue**
- User provides issue URL
- Fetch via `gh issue view <url>`
- Extract: title, description, reproduction steps
**Option C: Paste**
- Ask user to paste the bug report directly
- Parse out the key details
## Step 2: Discover Environment
Gather environment information:
!`nvidia-smi --query-gpu=name,count --format=csv,noheader 2>/dev/null || echo "No GPU detected"`
!`uname -a`
!`which python && python --version`
This tells you:
- GPU type and count (L40s, H100s, etc.)
- OS/platform
- Python environment
**Note**: The user's `~/.claude/CLAUDE.md` may have more details about their dev environment (paths, aliases, preferences). Check there for additional context.
## Step 3: Create Worklog
Create a worklog file to track the investigation:
- Filename: `<issue-slug>.md` in current directory
- Template:
```markdown
# Debug: [Issue Title]
**Date**: [today's date]
**Source**: [Linear ticket / GitHub issue / user report]
description:File a GitHub bug issue against ai-dynamo/dynamo using context from the current conversation.
user-invocable:true
---
# File a Dynamo Bug Issue
Use the current conversation context to file a well-structured bug report against `ai-dynamo/dynamo` via the `gh` CLI.
## Instructions
1.**Gather context from the conversation.** Review what the user has been working on, the problem encountered, error messages, logs, stack traces, and any reproduction steps already discussed. If critical details are missing, ask the user briefly — but prefer inferring from conversation context over asking.
2.**Collect environment info.** Determine whether the user is running in a **Kubernetes** or **local development** environment based on conversation context. Then gather the relevant environment details:
For **Kubernetes** environments:
- K8s version / distribution (e.g., EKS, GKE, kind)
- Dynamo runtime version / container image tag
- Node OS and CPU architecture
- CUDA version and GPU architecture (if applicable)
- Python version (if applicable)
- Helm chart version or manifest details
For **local development** environments:
- OS and version
- Dynamo runtime version
- CPU architecture
- CUDA version and GPU architecture (if applicable)
- Python version
Use shell commands to auto-detect what you can (e.g., `uname -m`, `python3 --version`, `nvidia-smi`, `kubectl version`). Fill in what's available and mark unknowns as "N/A".
3.**Draft the issue** using this template and present it to the user for review before filing:
```
**Describe the Bug**
<clear, concise description>
**Steps to Reproduce**
1. ...
2. ...
<!-- Include relevant manifests or public container references if applicable -->
**Expected Behavior**
<what should have happened>
**Actual Behavior**
<what actually happened — include error messages, logs, or stack traces>
**Environment**
- **OS:** ...
- **Dynamo Runtime Version:** ...
- **CPU Architecture:** ...
- **CUDA Version:** ...
- **GPU Architecture:** ...
- **Python Version:** ...
<!-- Add K8s-specific fields if applicable -->
```
4.**Show the draft to the user** and ask for confirmation or edits before filing.
**Check if full CI should be running.** The dynamo repo has two tiers of CI:
-**Lightweight (`pre-merge.yml`)**: Triggers on all `pull_request` events. Runs pre-commit, copyright checks, DCO, and optionally rust-clippy/rust-tests (if rust filter matches). This always runs.
-**Full pipeline (`pr.yaml`, `container-validation-dynamo.yml`)**: Triggers on `push` to `main` or `pull-request/[0-9]+` branches ONLY. This includes docker builds, GPU tests, deploy tests. It does NOT trigger on regular PR branches.
To get full CI on a PR, a `pull-request/$PR_NUMBER` branch must exist (created by copy-pr-bot after an NVIDIA maintainer approves). Check:
```bash
gh api repos/ai-dynamo/dynamo/branches/pull-request/$PR_NUMBER 2>/dev/null
```
If the branch does not exist, full CI has not been triggered. Common reasons:
1.**Awaiting approval**: An NVIDIA maintainer needs to comment `/ok to test <commit_sha>` on the PR to create the branch and trigger full CI. This applies to both fork PRs and internal PRs from authors not yet in the approval list.
2.**DCO failure**: Commits are not signed (`Signed-off-by` line missing). Check for a DCO bot comment. Fix: author signs their commits (see `DCO.md`).
3.**Draft PR**: Some checks may not run until marked ready for review.
**Verify which workflows actually ran** against the PR's HEAD commit:
gh api "repos/ai-dynamo/dynamo/actions/runs?head_sha=$HEAD_SHA"--jq'.workflow_runs[] | {name: .name, status: .status, conclusion: .conclusion}'
```
Compare the workflows that ran against what's expected. If only `Pre Merge`, `Copyright Checks`, `DCO Commenter`, etc. appear but NOT `PR` or `Dynamo Validation`, then full CI was not triggered.
**Determine which CI filters are active.** If full CI ran, read the `changed-files` job output from the actual workflow run — this is the authoritative source for which filters are `true`:
```bash
# Find the PR workflow run ID
PR_RUN_ID=$(gh api "repos/ai-dynamo/dynamo/actions/runs?head_sha=$HEAD_SHA"--jq'.workflow_runs[] | select(.name == "PR") | .id')
# Get the changed-files job output (look for the filter results in the logs)
If full CI hasn't run, fall back to fetching `filters.yaml` and matching manually:
```bash
gh api repos/ai-dynamo/dynamo/contents/.github/filters.yaml --jq'.content' | base64-d
```
Key rules for manual matching:
-`core` being true triggers ALL framework pipelines (vllm, sglang, trtllm).
- Filters use YAML anchors (e.g., `*ci`) — resolve these when reading.
- Negation patterns like `!**/*.md` in a filter mean markdown-only changes do NOT trigger that filter.
**Jobs that always run in `pr.yaml` regardless of filters:**
-`changed-files`, `deploy-operator`, `backend-status-check`, `clean-k8s-builder`, `cleanup` — these run unconditionally or with `if: always()`.
## Step 2: CI Status Dashboard
Fetch all check runs. Note: `gh pr checks` returns non-zero exit codes when checks are pending or failed — this is normal, not an error.
```bash
gh pr checks $PR_NUMBER--repo ai-dynamo/dynamo
```
If no checks are returned at all, refer back to the Step 1 diagnosis (DCO, approval, draft status).
**Ignore external CI checks** (e.g., GitLab mirror `ci/gitlab/*`). These are NVIDIA-internal pipelines that cannot be inspected from GitHub. Only analyze GitHub Actions checks.
**Distinguish two situations:**
1.**Full CI triggered** (workflow runs include `PR`, `Dynamo Validation`): Analyze all jobs normally.
2.**Only lightweight CI ran** (`Pre Merge` and utility workflows only): Report this clearly. The filter predictions from Step 1 describe what *would* run once full CI triggers, but there's nothing to analyze yet beyond the lightweight checks.
If full CI ran, **identify the critical path** — which checks are most relevant to this PR's changes based on the filter mapping from Step 1.
Produce a concise dashboard grouped by status:
-**Failed** — needs immediate attention
-**Pending/In-progress** — still running (note which are critical path)
Then fetch logs for specific failed jobs. The check URL format from `gh pr checks` is `https://github.com/.../actions/runs/{RUN_ID}/job/{JOB_ID}`. Extract the `RUN_ID` (first number):
```bash
gh run view $RUN_ID--repo ai-dynamo/dynamo --log-failed 2>&1 | tail-200
```
Note: `--log-failed` concatenates all failed job logs, which can be noisy. For multi-failure runs, prefer fetching per-job to isolate root causes.
For each failure, report:
-**Job name** and which workflow it belongs to
-**Root cause** — the actual error (compilation error, test assertion, timeout, infra issue, DCO sign-off failure, etc.)
-**Relevant log excerpt** — the key lines (max 20 lines), not the full dump
-**Suggested fix** — concrete action the PR author can take
- If it looks like an infra flake, include: `gh run rerun $RUN_ID --repo ai-dynamo/dynamo --failed`
If there are no failures, say so and move on.
## Step 4: Skip & Discrepancy Analysis
This step is **exception-based only**. Do NOT enumerate expected skips — that's noise.
**If full CI did not trigger**, skip this step entirely — there are no jobs to analyze. The absence of full CI was already explained in Steps 1-2.
**If full CI ran**, compare the filter results from Step 1 against actual CI results from Step 2. Only report **surprises**:
- A filter should be `true` (files matched its paths) but the corresponding job was skipped or missing
- A gate job (`backend-status-check`, `dynamo-status-check`) was skipped — these use `if: always()` and should always run
- All framework pipelines skipped when `core` files changed (core should trigger all frameworks)
**Things that are NOT surprises** (do not report):
- Jobs skipped because their filter is `false` (e.g., docs jobs skipped when no docs changed)
- Multi-GPU tests skipped on pre-merge (these are gated to post-merge/nightly)
- arm64 copy-to-acr jobs skipped (only triggered on merge to main)
- arm64 GPU tests skipped (GPU tests are amd64-only)
- Downstream jobs skipped because their upstream was legitimately skipped
- External CI (GitLab) status — ignored entirely
-`deploy-operator` running despite `operator=false` — this job always runs in `pr.yaml`
If everything matches expectations, say "No unexpected skips or discrepancies" and move on.
## Step 5: Actionable Summary
Synthesize into a concise report:
**PR Health: [PASSING | FAILING | PENDING | CI NOT TRIGGERED | PARTIAL — lightweight only]**
**If full CI not triggered:**
- "Full CI awaiting approval — an NVIDIA maintainer needs to comment `/ok to test <sha>` to create the `pull-request/$PR_NUMBER` branch."
- "DCO check failed — commits need to be signed. See DCO.md."
- "Draft PR — some checks may not run until marked ready for review."
- Note which lightweight checks passed/failed.
**Blocking issues** — failures that must be fixed before merge:
- One-line root cause + suggested fix for each
- Note if `backend-status-check` or `dynamo-status-check` gate is failing
**Non-blocking issues** (if any):
- Flaky tests, infra timeouts, unexpected skips
**Critical path status** — the checks most relevant to this PR's changes:
- List them with current status (passed/failed/pending/not triggered)
- If pending, suggest re-checking in ~15 minutes
**Next steps** — concrete actions ordered by priority:
- "An NVIDIA maintainer should comment `/ok to test <sha>`" if full CI hasn't triggered
- "Sign your commits with `git commit --amend -s`" for DCO failures
- "Fix X in file Y" for code failures
- Re-run command for infra flakes: `gh run rerun $RUN_ID --repo ai-dynamo/dynamo --failed`
- "No action needed — CI is green" if everything passed
## Step 6: Monitor Pending Checks
If any checks are still pending or in-progress, offer to monitor them.
- Estimated wait time based on similar completed jobs (e.g., if `vllm-cuda12.9-amd64 / Test` took 20m and `vllm-cuda13.0-amd64 / Test` is still running, estimate ~20m remaining)
When all checks complete, re-run the summary from Step 5 with final results. Report any checks that changed from pending to failed since the last check.
## Behavior Notes
-**Concurrency cancellation**: If a PR has rapid pushes, earlier runs get cancelled. Note if you see cancelled runs and suggest checking the latest run instead.
-**Large log output**: Always truncate to relevant sections. Never dump more than 50 lines of raw log in the summary.
-**Rate limits**: If `gh` commands fail due to rate limiting, report what you could gather and suggest retrying later.
-**Multiple workflows**: A single push can trigger `pr.yaml`, `pre-merge.yml`, and `container-validation-dynamo.yml`. Check all of them.
-**`pull-request/[0-9]+` branches**: Created by copy-pr-bot after maintainer approval. Required for full CI — applies to both fork and internal PRs.
-**`external-contribution` label**: Fork PRs get this label automatically. Its presence confirms the PR is from an external contributor.
-**External CI (GitLab)**: Ignore `ci/gitlab/*` checks entirely. These are NVIDIA-internal and cannot be diagnosed from GitHub.
description:Generate optimized tool call parsers for dynamo from HuggingFace model chat templates. Use this when you need to add support for a new model's tool calling format. Takes a HuggingFace model name, analyzes its chat template, compares with existing parsers, and either maps to existing parser or generates new Rust code with tests for the dynamo tool_calling library.
license:"Apache-2.0"
---
# Tool Parser Generator Skill
Add support for new models' tool calling formats by analyzing their chat templates and generating appropriate parser implementations for dynamo.
## When to Use This Skill
- User asks to add tool calling support for a specific HuggingFace model
- User wants to understand how a model structures tool calls
- User needs to extend dynamo's parser library with new formats
## Workflow
Follow this systematic workflow when the user provides a HuggingFace model name.
### Phase 1: Fetch and Extract Chat Template
1.**Fetch tokenizer config from HuggingFace Hub**: