feat: initial claude skills (#6703)

Co-authored-by: tmontfort <tmontfort@nvidia.com>

feat: initial claude skills (#6703)
Co-authored-by: tmontfort <tmontfort@nvidia.com>
7546c193 · ishandhanani · GitHub · 9e5014da · 7546c193 · 7546c193
Unverified Commit 7546c193 authored Feb 28, 2026 by ishandhanani Committed by GitHub Feb 28, 2026
8 changed files
--- a/.claude/skills/debug-session/SKILL.md
+++ b/.claude/skills/debug-session/SKILL.md
+---
+name: debug-session
+description: Start a debugging session with worklog file
+user-invocable: true
+disable-model-invocation: true
+---
+# Start Debug Session
+Create a structured debugging session for an issue in the Dynamo ecosystem.
+## Step 1: Get the Bug Report
+Ask the user how they want to provide the bug:
+**Option A: Linear ticket**
+- User provides ticket ID (e.g., "DYN-123")
+- Fetch via Linear MCP tools
+- Extract: title, description, reproduction steps
+**Option B: GitHub issue**
+- User provides issue URL
+- Fetch via `gh issue view <url>`
+- Extract: title, description, reproduction steps
+**Option C: Paste**
+- Ask user to paste the bug report directly
+- Parse out the key details
+## Step 2: Discover Environment
+Gather environment information:
+!`nvidia-smi --query-gpu=name,count --format=csv,noheader 2>/dev/null || echo "No GPU detected"`
+!`uname -a`
+!`which python && python --version`
+This tells you:
+- GPU type and count (L40s, H100s, etc.)
+- OS/platform
+- Python environment
+**Note**: The user's `~/.claude/CLAUDE.md` may have more details about their dev environment (paths, aliases, preferences). Check there for additional context.
+## Step 3: Create Worklog
+Create a worklog file to track the investigation:
+- Filename: `<issue-slug>.md` in current directory
+- Template:
+```markdown
+# Debug: [Issue Title]
+**Date**: [today's date]
+**Source**: [Linear ticket / GitHub issue / user report]
+**Status**: investigating
+**Environment**: [GPU type/count from nvidia-smi]
+## Problem
+[Description of the issue]
+## Reproduction Steps
+1. [Step to reproduce]
+2. ...
+## Expected vs Actual
+- **Expected**:
+- **Actual**:
+## Investigation Log
+### [timestamp]
+[Notes on what you tried/found]
+## Root Cause
+[Fill in when found]
+## Fix
+[Fill in when implemented]
+```
+## Step 4: Set Up Testing
+### Build Commands
+Rebuild Dynamo after making changes:
+```bash
+cd lib/bindings/python && maturin develop --uv && cd ../../.. && uv pip install -e .
+```
+If a framework change is required (sglang, vllm, trtllm), check the user's `~/.claude/CLAUDE.md` for rebuild instructions specific to that framework.
+### Running Examples
+Examples are located at: `/home/ubuntu/dynamo/examples/backends/`
+Available backends:
+- `sglang/launch/` - SGLang backend examples
+- `vllm/launch/` - vLLM backend examples
+- `trtllm/launch/` - TensorRT-LLM backend examples
+Based on the bug report, determine which backend is relevant:
+- If unclear, **ask the user** which backend/example to run
+- Run the example in the background
+- Wait for model to be ready
+### Verifying the Model is Up
+```bash
+curl localhost:8000/v1/models
+```
+### Testing with a Request
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "<model-name-from-above>",
+    "messages": [{"role": "user", "content": "Hello"}],
+    "max_tokens": 50
+  }'
+```
+## Step 5: Begin Investigation
+### Dynamo Infrastructure Debugging
+**KV cache and routing issues:**
+- Check KV event logs in `lib/llm/src/block_manager/kv_consolidator/tracker.rs`
+- Look at block manager state and consolidation behavior
+- Inspect routing decisions in the KV-aware router
+**ZMQ / networking issues:**
+- Check ZMQ socket configuration and endpoint bindings
+- Look for connection timeouts or message drops
+- Verify nats/etcd connectivity for service discovery
+**Multi-node / disaggregated issues:**
+- Check prefill/decode worker assignment
+- Verify DGD (disaggregated) status reporting
+- Inspect inter-node communication via `nvidia-smi` on each node
+- Check NCCL and GPU direct RDMA status
+**Process inspection:**
+- `ps aux | grep dynamo` - check running processes
+- `nvidia-smi` - GPU utilization and memory
+- `ss -tlnp | grep 8000` - check port bindings
+- `journalctl -u dynamo` - systemd logs if applicable
+### General Debugging Workflow
+1. **Reproduce first** - verify you can trigger the bug before attempting fixes
+2. **Document as you go** - update the worklog with findings
+3. **Minimal changes** - fix the bug, do not refactor surrounding code
+4. **Verify the fix** - confirm the reproduction case now passes
+Performance-critical code - avoid unnecessary abstractions or comments.
--- a/.claude/skills/gh-issue-bug/SKILL.md
+++ b/.claude/skills/gh-issue-bug/SKILL.md
+---
+name: dynamo-bug
+description: File a GitHub bug issue against ai-dynamo/dynamo using context from the current conversation.
+user-invocable: true
+---
+# File a Dynamo Bug Issue
+Use the current conversation context to file a well-structured bug report against `ai-dynamo/dynamo` via the `gh` CLI.
+## Instructions
+1. **Gather context from the conversation.** Review what the user has been working on, the problem encountered, error messages, logs, stack traces, and any reproduction steps already discussed. If critical details are missing, ask the user briefly — but prefer inferring from conversation context over asking.
+2. **Collect environment info.** Determine whether the user is running in a **Kubernetes** or **local development** environment based on conversation context. Then gather the relevant environment details:
+   For **Kubernetes** environments:
+   - K8s version / distribution (e.g., EKS, GKE, kind)
+   - Dynamo runtime version / container image tag
+   - Node OS and CPU architecture
+   - CUDA version and GPU architecture (if applicable)
+   - Python version (if applicable)
+   - Helm chart version or manifest details
+   For **local development** environments:
+   - OS and version
+   - Dynamo runtime version
+   - CPU architecture
+   - CUDA version and GPU architecture (if applicable)
+   - Python version
+   Use shell commands to auto-detect what you can (e.g., `uname -m`, `python3 --version`, `nvidia-smi`, `kubectl version`). Fill in what's available and mark unknowns as "N/A".
+3. **Draft the issue** using this template and present it to the user for review before filing:
+   ```
+   **Describe the Bug**
+   <clear, concise description>
+   **Steps to Reproduce**
+   1. ...
+   2. ...
+   <!-- Include relevant manifests or public container references if applicable -->
+   **Expected Behavior**
+   <what should have happened>
+   **Actual Behavior**
+   <what actually happened — include error messages, logs, or stack traces>
+   **Environment**
+   - **OS:** ...
+   - **Dynamo Runtime Version:** ...
+   - **CPU Architecture:** ...
+   - **CUDA Version:** ...
+   - **GPU Architecture:** ...
+   - **Python Version:** ...
+   <!-- Add K8s-specific fields if applicable -->
+   ```
+4. **Show the draft to the user** and ask for confirmation or edits before filing.
+5. **File the issue** using:
+   ```
+   gh issue create --repo ai-dynamo/dynamo --title "<title>" --body "<body>"
+   ```
+6. **Return the issue URL** to the user after creation.
--- a/.claude/skills/pr-monitor/SKILL.md
+++ b/.claude/skills/pr-monitor/SKILL.md
+---
+name: pr-monitor
+description: Check CI status, analyze failures, and explain skips for a Dynamo PR
+user-invocable: true
+disable-model-invocation: true
+---
+# PR CI Monitor
+Perform a full health check on a Dynamo pull request. Takes a PR number as argument (e.g., `/dynamo:pr-monitor 6554`).
+## Step 1: PR Overview
+Gather PR metadata and determine what CI should look like for this PR.
+```bash
+gh pr view $PR_NUMBER --repo ai-dynamo/dynamo --json title,body,author,state,isDraft,additions,deletions,changedFiles,labels,reviewDecision,headRefName,baseRefName
+gh pr diff $PR_NUMBER --repo ai-dynamo/dynamo --name-only
+```
+**Check if full CI should be running.** The dynamo repo has two tiers of CI:
+- **Lightweight (`pre-merge.yml`)**: Triggers on all `pull_request` events. Runs pre-commit, copyright checks, DCO, and optionally rust-clippy/rust-tests (if rust filter matches). This always runs.
+- **Full pipeline (`pr.yaml`, `container-validation-dynamo.yml`)**: Triggers on `push` to `main` or `pull-request/[0-9]+` branches ONLY. This includes docker builds, GPU tests, deploy tests. It does NOT trigger on regular PR branches.
+To get full CI on a PR, a `pull-request/$PR_NUMBER` branch must exist (created by copy-pr-bot after an NVIDIA maintainer approves). Check:
+```bash
+gh api repos/ai-dynamo/dynamo/branches/pull-request/$PR_NUMBER 2>/dev/null
+```
+If the branch does not exist, full CI has not been triggered. Common reasons:
+1. **Awaiting approval**: An NVIDIA maintainer needs to comment `/ok to test <commit_sha>` on the PR to create the branch and trigger full CI. This applies to both fork PRs and internal PRs from authors not yet in the approval list.
+2. **DCO failure**: Commits are not signed (`Signed-off-by` line missing). Check for a DCO bot comment. Fix: author signs their commits (see `DCO.md`).
+3. **Draft PR**: Some checks may not run until marked ready for review.
+**Verify which workflows actually ran** against the PR's HEAD commit:
+```bash
+# Get the HEAD SHA
+HEAD_SHA=$(gh pr view $PR_NUMBER --repo ai-dynamo/dynamo --json headRefOid --jq '.headRefOid')
+# Check which workflow runs exist for this SHA
+gh api "repos/ai-dynamo/dynamo/actions/runs?head_sha=$HEAD_SHA" --jq '.workflow_runs[] | {name: .name, status: .status, conclusion: .conclusion}'
+```
+Compare the workflows that ran against what's expected. If only `Pre Merge`, `Copyright Checks`, `DCO Commenter`, etc. appear but NOT `PR` or `Dynamo Validation`, then full CI was not triggered.
+**Determine which CI filters are active.** If full CI ran, read the `changed-files` job output from the actual workflow run — this is the authoritative source for which filters are `true`:
+```bash
+# Find the PR workflow run ID
+PR_RUN_ID=$(gh api "repos/ai-dynamo/dynamo/actions/runs?head_sha=$HEAD_SHA" --jq '.workflow_runs[] | select(.name == "PR") | .id')
+# Get the changed-files job output (look for the filter results in the logs)
+gh api "repos/ai-dynamo/dynamo/actions/runs/$PR_RUN_ID/jobs" --jq '.jobs[] | select(.name == "changed-files") | {id: .id, status: .status, conclusion: .conclusion}'
+```
+If full CI hasn't run, fall back to fetching `filters.yaml` and matching manually:
+```bash
+gh api repos/ai-dynamo/dynamo/contents/.github/filters.yaml --jq '.content' | base64 -d
+```
+Key rules for manual matching:
+- `core` being true triggers ALL framework pipelines (vllm, sglang, trtllm).
+- Filters use YAML anchors (e.g., `*ci`) — resolve these when reading.
+- Negation patterns like `!**/*.md` in a filter mean markdown-only changes do NOT trigger that filter.
+**Jobs that always run in `pr.yaml` regardless of filters:**
+- `changed-files`, `deploy-operator`, `backend-status-check`, `clean-k8s-builder`, `cleanup` — these run unconditionally or with `if: always()`.
+## Step 2: CI Status Dashboard
+Fetch all check runs. Note: `gh pr checks` returns non-zero exit codes when checks are pending or failed — this is normal, not an error.
+```bash
+gh pr checks $PR_NUMBER --repo ai-dynamo/dynamo
+```
+If no checks are returned at all, refer back to the Step 1 diagnosis (DCO, approval, draft status).
+**Ignore external CI checks** (e.g., GitLab mirror `ci/gitlab/*`). These are NVIDIA-internal pipelines that cannot be inspected from GitHub. Only analyze GitHub Actions checks.
+**Distinguish two situations:**
+1. **Full CI triggered** (workflow runs include `PR`, `Dynamo Validation`): Analyze all jobs normally.
+2. **Only lightweight CI ran** (`Pre Merge` and utility workflows only): Report this clearly. The filter predictions from Step 1 describe what *would* run once full CI triggers, but there's nothing to analyze yet beyond the lightweight checks.
+If full CI ran, **identify the critical path** — which checks are most relevant to this PR's changes based on the filter mapping from Step 1.
+Produce a concise dashboard grouped by status:
+- **Failed** — needs immediate attention
+- **Pending/In-progress** — still running (note which are critical path)
+- **Passed** — healthy (just count, don't enumerate unless asked)
+- **Skipped** — handled in Step 4
+## Step 3: Failure Analysis
+For each failed GitHub Actions job, drill into the logs to extract root cause.
+First, identify the failed jobs within each run:
+```bash
+gh api repos/ai-dynamo/dynamo/actions/runs/$RUN_ID/jobs --jq '.jobs[] | select(.conclusion == "failure") | {name: .name, id: .id, html_url: .html_url}'
+```
+Then fetch logs for specific failed jobs. The check URL format from `gh pr checks` is `https://github.com/.../actions/runs/{RUN_ID}/job/{JOB_ID}`. Extract the `RUN_ID` (first number):
+```bash
+gh run view $RUN_ID --repo ai-dynamo/dynamo --log-failed 2>&1 | tail -200
+```
+Note: `--log-failed` concatenates all failed job logs, which can be noisy. For multi-failure runs, prefer fetching per-job to isolate root causes.
+For each failure, report:
+- **Job name** and which workflow it belongs to
+- **Root cause** — the actual error (compilation error, test assertion, timeout, infra issue, DCO sign-off failure, etc.)
+- **Relevant log excerpt** — the key lines (max 20 lines), not the full dump
+- **Suggested fix** — concrete action the PR author can take
+- If it looks like an infra flake, include: `gh run rerun $RUN_ID --repo ai-dynamo/dynamo --failed`
+If there are no failures, say so and move on.
+## Step 4: Skip & Discrepancy Analysis
+This step is **exception-based only**. Do NOT enumerate expected skips — that's noise.
+**If full CI did not trigger**, skip this step entirely — there are no jobs to analyze. The absence of full CI was already explained in Steps 1-2.
+**If full CI ran**, compare the filter results from Step 1 against actual CI results from Step 2. Only report **surprises**:
+- A filter should be `true` (files matched its paths) but the corresponding job was skipped or missing
+- A gate job (`backend-status-check`, `dynamo-status-check`) was skipped — these use `if: always()` and should always run
+- All framework pipelines skipped when `core` files changed (core should trigger all frameworks)
+**Things that are NOT surprises** (do not report):
+- Jobs skipped because their filter is `false` (e.g., docs jobs skipped when no docs changed)
+- Multi-GPU tests skipped on pre-merge (these are gated to post-merge/nightly)
+- arm64 copy-to-acr jobs skipped (only triggered on merge to main)
+- arm64 GPU tests skipped (GPU tests are amd64-only)
+- Downstream jobs skipped because their upstream was legitimately skipped
+- External CI (GitLab) status — ignored entirely
+- `deploy-operator` running despite `operator=false` — this job always runs in `pr.yaml`
+If everything matches expectations, say "No unexpected skips or discrepancies" and move on.
+## Step 5: Actionable Summary
+Synthesize into a concise report:
+**PR Health: [PASSING | FAILING | PENDING | CI NOT TRIGGERED | PARTIAL — lightweight only]**
+**If full CI not triggered:**
+- "Full CI awaiting approval — an NVIDIA maintainer needs to comment `/ok to test <sha>` to create the `pull-request/$PR_NUMBER` branch."
+- "DCO check failed — commits need to be signed. See DCO.md."
+- "Draft PR — some checks may not run until marked ready for review."
+- Note which lightweight checks passed/failed.
+**Blocking issues** — failures that must be fixed before merge:
+- One-line root cause + suggested fix for each
+- Note if `backend-status-check` or `dynamo-status-check` gate is failing
+**Non-blocking issues** (if any):
+- Flaky tests, infra timeouts, unexpected skips
+**Critical path status** — the checks most relevant to this PR's changes:
+- List them with current status (passed/failed/pending/not triggered)
+- If pending, suggest re-checking in ~15 minutes
+**Next steps** — concrete actions ordered by priority:
+- "An NVIDIA maintainer should comment `/ok to test <sha>`" if full CI hasn't triggered
+- "Sign your commits with `git commit --amend -s`" for DCO failures
+- "Fix X in file Y" for code failures
+- Re-run command for infra flakes: `gh run rerun $RUN_ID --repo ai-dynamo/dynamo --failed`
+- "No action needed — CI is green" if everything passed
+## Step 6: Monitor Pending Checks
+If any checks are still pending or in-progress, offer to monitor them.
+**List remaining checks:**
+```bash
+gh pr checks $PR_NUMBER --repo ai-dynamo/dynamo | grep -E 'pending|queued|in_progress'
+```
+Report:
+- How many checks are still pending
+- Which ones are on the critical path
+- Estimated wait time based on similar completed jobs (e.g., if `vllm-cuda12.9-amd64 / Test` took 20m and `vllm-cuda13.0-amd64 / Test` is still running, estimate ~20m remaining)
+If the user wants to wait, poll periodically:
+```bash
+# Re-check status
+gh pr checks $PR_NUMBER --repo ai-dynamo/dynamo | grep -cE 'pass|fail|skipped'  # completed count
+gh pr checks $PR_NUMBER --repo ai-dynamo/dynamo | grep -cE 'pending|queued|in_progress'  # remaining count
+```
+When all checks complete, re-run the summary from Step 5 with final results. Report any checks that changed from pending to failed since the last check.
+## Behavior Notes
+- **Concurrency cancellation**: If a PR has rapid pushes, earlier runs get cancelled. Note if you see cancelled runs and suggest checking the latest run instead.
+- **Large log output**: Always truncate to relevant sections. Never dump more than 50 lines of raw log in the summary.
+- **Rate limits**: If `gh` commands fail due to rate limiting, report what you could gather and suggest retrying later.
+- **Multiple workflows**: A single push can trigger `pr.yaml`, `pre-merge.yml`, and `container-validation-dynamo.yml`. Check all of them.
+- **`pull-request/[0-9]+` branches**: Created by copy-pr-bot after maintainer approval. Required for full CI — applies to both fork and internal PRs.
+- **`external-contribution` label**: Fork PRs get this label automatically. Its presence confirms the PR is from an external contributor.
+- **External CI (GitLab)**: Ignore `ci/gitlab/*` checks entirely. These are NVIDIA-internal and cannot be diagnosed from GitHub.
--- a/.claude/skills/tool-parser-generator/README.md
+++ b/.claude/skills/tool-parser-generator/README.md
+# Tool Parser Generator Skill
+A Claude Code skill for adding tool calling support to dynamo by analyzing HuggingFace model chat templates.
+## Overview
+This skill provides a systematic workflow for:
+1. Fetching chat templates from HuggingFace models
+2. Analyzing tool call patterns and formats
+3. Matching against existing dynamo parsers
+4. Generating new parser implementations when needed
+5. Creating appropriate tests and integration code
+## Key Features
+- **LLM-Driven**: Leverages Claude's code analysis and generation capabilities
+- **Minimal Changes**: Prefers configuration over new code when possible
+- **Reference-Aware**: Compares with sglang and vLLM implementations
+- **Test Generation**: Automatically creates comprehensive tests
+- **Well-Documented**: Includes examples and integration guides
+## Usage
+Simply ask Claude to add tool calling support for a model:
+```
+Add tool calling support for Qwen/Qwen2.5-72B-Instruct
+```
+Claude will:
+1. Fetch the model's tokenizer config from HuggingFace
+2. Extract and analyze the chat template
+3. Compare with existing dynamo parsers
+4. Either configure an existing parser or generate a new one
+5. Create tests and integration instructions
+## Structure
+```
+tool-parser-generator/
+├── SKILL.md                       # Main skill documentation with workflow
+├── README.md                      # This file
+└── references/
+    ├── parser-patterns.md         # Quick reference for common patterns
+    └── integration-guide.md       # Step-by-step integration instructions
+```
+## Workflow Phases
+1. **Fetch & Extract**: Get chat template from HuggingFace Hub
+2. **Analyze**: Identify markers, format type, and structure
+3. **Compare**: Match against existing dynamo parsers
+4. **Generate/Configure**: Create parser or configuration
+5. **Test**: Generate comprehensive test cases
+6. **Integrate**: Add to dynamo codebase with proper registration
+## Philosophy
+- **Prefer Existing**: Most models (>80%) can use existing parsers
+- **Minimal Code**: Configuration over implementation when possible
+- **Well-Tested**: Every parser needs comprehensive tests
+- **Reference-Driven**: Learn from sglang and vLLM implementations
+## References
+- **Dynamo Parsers**: `/lib/parsers/src/tool_calling/`
+- **sglang**: https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/function_call
+- **vLLM**: https://github.com/vllm-project/vllm/tree/main/vllm/tool_parsers
+- **HuggingFace**: https://huggingface.co/docs/transformers/chat_templating
+## Example
+See `SKILL.md` for a complete walkthrough of adding support for Qwen/Qwen2.5-72B-Instruct.
+## License
+Apache 2.0 - See top-level LICENSE for details.
--- a/.claude/skills/tool-parser-generator/SKILL.md
+++ b/.claude/skills/tool-parser-generator/SKILL.md
+---
+name: tool-parser-generator
+description: Generate optimized tool call parsers for dynamo from HuggingFace model chat templates. Use this when you need to add support for a new model's tool calling format. Takes a HuggingFace model name, analyzes its chat template, compares with existing parsers, and either maps to existing parser or generates new Rust code with tests for the dynamo tool_calling library.
+license: "Apache-2.0"
+---
+# Tool Parser Generator Skill
+Add support for new models' tool calling formats by analyzing their chat templates and generating appropriate parser implementations for dynamo.
+## When to Use This Skill
+- User asks to add tool calling support for a specific HuggingFace model
+- User wants to understand how a model structures tool calls
+- User needs to extend dynamo's parser library with new formats
+## Workflow
+Follow this systematic workflow when the user provides a HuggingFace model name.
+### Phase 1: Fetch and Extract Chat Template
+1. **Fetch tokenizer config from HuggingFace Hub**:
+   ```
+   URL: https://huggingface.co/{model_id}/resolve/main/tokenizer_config.json
+   ```
+2. **Extract chat template**:
+   - Parse the JSON response
+   - Look for `chat_template` field
+   - Handle two formats:
+     - String: Single template
+     - Array: List of templates with `name` and `template` fields
+       - Prefer `tool_use` template if available
+       - Fall back to `default` template
+3. **Extract special tokens** (if relevant):
+   - `bos_token`, `eos_token`, `unk_token`
+   - `additional_special_tokens`
+   - Any tool-specific tokens in the config
+### Phase 2: Analyze Chat Template
+The chat template is a Jinja template. Analyze it to identify tool call patterns:
+1. **Find tool-related sections**:
+   - Look for conditional blocks with keywords: `tools`, `tool_call`, `function`, `available_tools`
+   - Extract content within `{% if tools %}...{% endif %}` blocks
+   - Find `{% for tool in tools %}` loops
+2. **Identify markers and format**:
+   - **Start markers**: Tokens/strings before tool calls
+     - Examples: `<tool_call>`, `[TOOL_CALLS]`, `<|python_tag|>`, `<｜tool▁call▁begin｜>`
+   - **End markers**: Tokens/strings after tool calls
+     - Examples: `</tool_call>`, `[/TOOL_CALLS]`, `<｜tool▁call▁end｜>`
+   - **Special tokens**: Unicode or encoded tokens (DeepSeek, Harmony)
+   - **Format type**:
+     - JSON: Look for `tojson` filter, `{` `}` brackets
+     - XML: Look for `<function=`, `<parameter=` patterns
+     - Pythonic: Look for `function(arg=val)` patterns
+     - DSML: Look for `<｜DSML｜` tokens
+3. **Identify JSON structure** (if JSON format):
+   - Name key: Usually `name` or `function`
+   - Arguments key: Usually `arguments` or `parameters`
+   - Array vs single object
+   - Multiple calls handling
+### Phase 3: Compare with Existing Parsers
+**Read existing parser implementations** in `/lib/parsers/src/tool_calling/`:
+1. **Check JSON parsers** (`json/` directory):
+   - `base_json_parser.rs` - Generic JSON with markers
+   - `deepseek_v3_parser.rs` - DeepSeek V3 format
+   - `deepseek_v3_1_parser.rs` - DeepSeek V3.1 format
+2. **Check XML parsers** (`xml/` directory):
+   - `parser.rs` - Qwen3 Coder XML format
+3. **Check other formats**:
+   - `pythonic/pythonic_parser.rs` - Python syntax
+   - `harmony/harmony_parser.rs` - Harmony protocol
+   - `dsml/parser.rs` - DeepSeek V3.2 DSML
+4. **Review config presets** in `config.rs`:
+   - Look at `ToolCallConfig::hermes()`, `mistral()`, `llama3_json()`, etc.
+   - Each preset defines start/end tokens, key names, parser type
+5. **Check parser registry** in `parsers.rs`:
+   - See how parsers are registered in `get_tool_parser_map()`
+   - Understand the `ParserType` enum and routing logic
+**Match the analyzed format**:
+- If start/end tokens and format match existing parser → Use existing parser with config
+- If similar but different tokens → Adapt existing parser config
+- If completely different format → Generate new parser
+### Phase 4: Generate or Configure Parser
+#### Option A: Use Existing Parser (Preferred)
+If a match is found, create a configuration preset:
+1. Add a new preset function to `/lib/parsers/src/tool_calling/config.rs`:
+   ```rust
+   impl ToolCallConfig {
+       pub fn new_model_name() -> Self {
+           Self {
+               config: ParserConfig::Json(JsonParserConfig {
+                   start_token: Some("<marker>".to_string()),
+                   end_token: Some("</marker>".to_string()),
+                   function_name_key: Some("name".to_string()),
+                   function_arguments_key: Some("arguments".to_string()),
+                   parser_type: JsonParserType::Basic,
+               }),
+           }
+       }
+   }
+   ```
+2. Register in parser map in `/lib/parsers/src/tool_calling/parsers.rs`
+3. **Create tests** to verify the configuration works
+#### Option B: Generate New Parser (If Needed)
+If no existing parser fits, generate new parser code:
+1. **Choose parser template** based on format:
+   - JSON format → Use `base_json_parser.rs` as template
+   - XML format → Use `xml/parser.rs` as template
+   - Custom format → Implement three core functions
+2. **Implement required functions**:
+   ```rust
+   // Detection
+   pub fn detect_tool_call_start_<name>(chunk: &str, config: &Config) -> bool
+   // Parsing
+   pub fn try_tool_call_parse_<name>(
+       message: &str,
+       config: &Config,
+       tools: Option<&[ToolDefinition]>,
+   ) -> Result<(Vec<ToolCallResponse>, Option<String>)>
+   // End detection (for streaming)
+   pub fn find_tool_call_end_position_<name>(chunk: &str, config: &Config) -> usize
+   ```
+3. **Use regex for token matching**:
+   - Use `OnceLock<Regex>` for compiled regexes
+   - Escape special characters properly
+   - Handle partial tokens for streaming
+4. **Parse JSON/XML content**:
+   - Use `serde_json` for JSON parsing
+   - Use regex for XML extraction (or XML parser if complex)
+   - Build `ToolCallResponse` structs
+5. **Add to appropriate directory**:
+   - JSON variants → `json/` directory
+   - XML variants → `xml/` directory
+   - New format → Create new subdirectory
+### Phase 5: Generate Tests
+For any new parser or configuration, generate comprehensive tests:
+1. **Basic tests**:
+   - Detection of start markers
+   - Parsing single tool call
+   - Parsing multiple tool calls
+   - Normal text extraction
+2. **Edge cases**:
+   - Empty arguments
+   - Missing fields
+   - Malformed JSON/XML
+   - Partial tokens (streaming)
+3. **Integration tests**:
+   - End-to-end with real model outputs (if available)
+   - Tool validation (if tools list provided)
+4. **Add tests** to appropriate location:
+   - Inline in parser file (in `#[cfg(test)]` module)
+   - Or in `/lib/parsers/src/tool_calling/tests.rs`
+### Phase 6: Integration
+1. **Update module exports**:
+   - Add `mod` declaration in parent `mod.rs`
+   - Export functions as needed
+2. **Register parser** in `parsers.rs` if new parser:
+   - Add to `get_tool_parser_map()` function
+   - **CRITICAL**: Update `test_get_available_tool_parsers()` test
+   - Add your new parser name to the `available_parsers` array in the test
+3. **Document the parser**:
+   - Add doc comments explaining format
+   - Include example input/output
+   - Reference model family
+4. **Run tests**:
+   ```bash
+   cd lib/parsers
+   cargo test tool_calling
+   ```
+5. **Verify with dynamo**:
+   - Test with actual model if possible
+   - Verify streaming behavior
+   - Check error handling
+## Key Reference Files
+**Dynamo Codebase**:
+- `/lib/parsers/src/tool_calling/` - All tool call parsers
+- `/lib/parsers/src/tool_calling/config.rs` - Configuration presets
+- `/lib/parsers/src/tool_calling/parsers.rs` - Parser registry
+- `/lib/llm/src/preprocessor/prompt/template/tokcfg.rs` - Chat template structures
+- `/lib/llm/src/preprocessor/prompt/template.rs` - Template loading
+**Reference Implementations**:
+- **sglang**: https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/function_call
+  - Look at detector pattern (base_format_detector.py)
+  - Model-specific detectors (qwen25_detector.py, deepseekv3_detector.py, etc.)
+- **vLLM**: https://github.com/vllm-project/vllm/tree/main/vllm/tool_parsers
+  - Look at abstract_tool_parser.py
+  - Model-specific parsers (llama_tool_parser.py, qwen3xml_tool_parser.py, etc.)
+- **HuggingFace**: https://huggingface.co/docs/transformers/chat_templating
+## Example: Adding Support for a New Model
+User: "Add tool calling support for Qwen/Qwen2.5-72B-Instruct"
+**Step 1**: Fetch tokenizer config
+- Use WebFetch to get `https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/resolve/main/tokenizer_config.json`
+**Step 2**: Analyze chat template
+- Extract `chat_template` field
+- Identify `{% if tools %}` block
+- Find markers: Likely `<tool_call>` and `</tool_call>`
+- Identify format: Check for JSON with `tojson` filter
+**Step 3**: Compare with existing parsers
+- Read `/lib/parsers/src/tool_calling/config.rs`
+- Check `ToolCallConfig::hermes()` - uses `<tool_call>` markers
+- Check if Qwen format matches hermes format
+**Step 4**: Use or adapt existing parser
+- If matches hermes: Create `qwen2_5()` config preset
+- If different: Generate new parser or adapt base_json_parser
+**Step 5**: Generate tests
+- Create test cases with example Qwen tool calls
+- Test detection, parsing, and edge cases
+**Step 6**: Integrate
+- Add config preset to `config.rs`
+- Register in parser map (`get_tool_parser_map()`)
+- Update `test_get_available_tool_parsers()` test
+- Run tests
+- Document
+## Tips
+- **Always prefer existing parsers**: Most models can use existing parsers with different configs
+- **Read reference implementations**: sglang and vLLM often have parsers for popular models
+- **Use WebFetch for HF models**: Don't assume - always fetch actual tokenizer config
+- **Test with real outputs**: If possible, get actual model outputs to test against
+- **Keep it simple**: Prefer straightforward regex over complex parsing when possible
+- **Document well**: Future you (or others) will thank you
+## Common Patterns
+### JSON with Brackets
+```
+[TOOL_CALLS] [{"name": "func", "arguments": {}}]
+```
+→ Use `base_json_parser` with bracket markers
+### JSON with XML Tags
+```xml
+<tool_call>
+{"name": "func", "arguments": {}}
+</tool_call>
+```
+→ Use `base_json_parser` with XML-style markers
+### XML Structure
+```xml
+<tool_call>
+<function=name>
+<parameter=key>value</parameter>
+</function>
+</tool_call>
+```
+→ Use `xml/parser.rs` or create variant
+### Nested Tokens
+```
+<｜tool▁call▁begin｜>name<｜tool▁sep｜>args<｜tool▁call▁end｜>
+```
+→ Create specialized parser (see DeepSeek parsers)
+## Minimal Changes Philosophy
+1. **First**: Try existing parser with new config
+2. **Second**: Adapt existing parser with minor tweaks
+3. **Last resort**: Create entirely new parser
+Most models (>80%) can use existing parsers with appropriate configuration.
--- a/.claude/skills/tool-parser-generator/references/integration-guide.md
+++ b/.claude/skills/tool-parser-generator/references/integration-guide.md
+# Parser Integration Guide
+Step-by-step guide for integrating new parsers or configurations into dynamo.
+## Option 1: Add Configuration Preset (Most Common)
+When an existing parser can handle the new model with different configuration.
+### Step 1: Add Config Preset
+Edit `/lib/parsers/src/tool_calling/config.rs`:
+```rust
+impl ToolCallConfig {
+    /// Configuration for ModelName
+    pub fn model_name() -> Self {
+        Self {
+            config: ParserConfig::Json(JsonParserConfig {
+                start_token: Some("<start>".to_string()),
+                end_token: Some("</end>".to_string()),
+                function_name_key: Some("name".to_string()),
+                function_arguments_key: Some("arguments".to_string()),
+                parser_type: JsonParserType::Basic,
+            }),
+        }
+    }
+}
+```
+### Step 2: Register in Parser Map
+Edit `/lib/parsers/src/tool_calling/parsers.rs`:
+In the `get_tool_parser_map()` function:
+```rust
+map.insert("model_name".to_string(), ParserType::Json);
+```
+### Step 3: Add Tests
+In the same file or in `tests.rs`:
+```rust
+#[test]
+fn test_model_name_parser() {
+    let config = ToolCallConfig::model_name();
+    let message = r#"<start>{"name": "test", "arguments": {}}</start>"#;
+    let (calls, _) = detect_and_parse_tool_call(message, Some("model_name"), None).unwrap();
+    assert_eq!(calls.len(), 1);
+    assert_eq!(calls[0].function.name, "test");
+}
+```
+### Step 4: Run Tests
+```bash
+cd lib/parsers
+cargo test model_name
+```
+## Option 2: Create New Parser
+When the format is truly unique and doesn't fit existing parsers.
+### Step 1: Choose Directory
+- JSON variant → `json/`
+- XML variant → `xml/`
+- New format → Create new subdirectory
+### Step 2: Create Parser File
+Example: `json/model_name_parser.rs`
+```rust
+// SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES.
+// SPDX-License-Identifier: Apache-2.0
+use anyhow::Result;
+use regex::Regex;
+use std::sync::OnceLock;
+use crate::tool_calling::{
+    config::{JsonParserConfig, ToolDefinition},
+    response::{CalledFunction, ToolCallResponse, ToolCallType},
+};
+static DETECT_REGEX: OnceLock<Regex> = OnceLock::new();
+pub fn detect_tool_call_start_model_name(
+    chunk: &str,
+    config: &JsonParserConfig,
+) -> bool {
+    // Implementation
+    todo!()
+}
+pub fn try_tool_call_parse_model_name(
+    message: &str,
+    config: &JsonParserConfig,
+    tools: Option<&[ToolDefinition]>,
+) -> Result<(Vec<ToolCallResponse>, Option<String>)> {
+    // Implementation
+    todo!()
+}
+pub fn find_tool_call_end_position_model_name(
+    chunk: &str,
+    config: &JsonParserConfig,
+) -> usize {
+    // Implementation
+    todo!()
+}
+#[cfg(test)]
+mod tests {
+    use super::*;
+    #[test]
+    fn test_detection() {
+        // Tests
+    }
+}
+```
+### Step 3: Update mod.rs
+In the parent directory's `mod.rs`:
+```rust
+pub mod model_name_parser;
+pub use model_name_parser::*;
+```
+### Step 4: Add Config Enum Variant
+Edit `config.rs` to add your parser type if needed:
+```rust
+pub enum ParserConfig {
+    Json(JsonParserConfig),
+    Xml(XmlParserConfig),
+    // ... existing variants
+    ModelName(ModelNameConfig), // If you need custom config
+}
+```
+### Step 5: Register in parsers.rs
+Add routing logic in `try_tool_call_parse()`:
+```rust
+match config {
+    // ... existing matches
+    ParserConfig::ModelName(cfg) => {
+        model_name_parser::try_tool_call_parse_model_name(message, cfg, tools)
+    }
+}
+```
+### Step 6: Run All Tests
+```bash
+cd lib/parsers
+cargo test tool_calling
+cargo clippy
+cargo fmt
+```
+## File Structure
+```
+lib/parsers/src/tool_calling/
+├── mod.rs                   # Main module exports
+├── config.rs                # Configurations and presets
+├── parsers.rs               # Parser routing and registry
+├── response.rs              # Response types
+├── tools.rs                 # High-level APIs
+├── tests.rs                 # Integration tests
+│
+├── json/                    # JSON parsers
+│   ├── mod.rs
+│   ├── base_json_parser.rs
+│   ├── deepseek_v3_parser.rs
+│   ├── deepseek_v3_1_parser.rs
+│   └── model_name_parser.rs  # Your new parser
+│
+├── xml/                     # XML parsers
+│   ├── mod.rs
+│   └── parser.rs
+│
+├── pythonic/                # Pythonic parsers
+├── harmony/                 # Harmony parsers
+└── dsml/                    # DSML parsers
+```
+## Testing Strategy
+### Unit Tests
+Test individual functions in the parser file:
+- `detect_tool_call_start_*`
+- `try_tool_call_parse_*`
+- `find_tool_call_end_position_*`
+### Integration Tests
+Test through the main API in `tests.rs`:
+- `detect_and_parse_tool_call()` with parser name
+- Streaming behavior
+- Tool validation
+### Example Output Tests
+Use real model outputs when possible:
+- Get actual tool call from model
+- Verify parsing produces correct structure
+- Check normal text extraction
+## Common Issues
+### Issue: Parser not found
+**Solution**: Check registration in `get_tool_parser_map()`
+### Issue: Detection works but parsing fails
+**Solution**: Check regex patterns and JSON structure keys
+### Issue: Streaming produces wrong results
+**Solution**: Verify `find_tool_call_end_position_*` implementation
+### Issue: Tests fail with "unknown tool"
+**Solution**: Either provide tools list or remove validation
+## Documentation
+Add doc comments to your parser:
+```rust
+//! ModelName tool call parser
+//!
+//! Format: <start>JSON</start>
+//!
+//! Example:
+//! ```text
+//! <start>{"name": "func", "arguments": {}}</start>
+//! ```
+//!
+//! Models: ModelName family
+```
+## Checklist
+- [ ] Parser implements three required functions
+- [ ] Config preset added (if using existing parser)
+- [ ] Parser registered in map
+- [ ] Unit tests pass
+- [ ] Integration tests pass
+- [ ] Documentation added
+- [ ] Clippy warnings resolved
+- [ ] Code formatted with rustfmt
--- a/.claude/skills/tool-parser-generator/references/parser-patterns.md
+++ b/.claude/skills/tool-parser-generator/references/parser-patterns.md
+# Tool Call Parser Patterns Reference
+Quick reference for common tool call patterns found in LLM chat templates.
+## Pattern Categories
+### 1. JSON with Special Tokens
+#### Bracket Markers (Mistral-style)
+```
+[TOOL_CALLS] [{"name": "get_weather", "arguments": {"location": "NYC"}}]
+```
+- Models: Mistral, Mixtral
+- Parser: `base_json_parser` with bracket config
+- Keys: `name`, `arguments`
+#### XML-Style Tags (Hermes-style)
+```xml
+<tool_call>
+{"name": "get_weather", "arguments": {"location": "NYC"}}
+</tool_call>
+```
+- Models: Hermes-2, Jamba
+- Parser: `base_json_parser` with XML-style markers
+- Keys: `name`, `arguments`
+#### Single Token Prefix (Llama-style)
+```
+<|python_tag|>[{"name": "get_weather", "arguments": {"location": "NYC"}}]
+```
+- Models: Llama 3.1, Llama 3.2
+- Parser: `base_json_parser` with single start token
+- Keys: `name`, `arguments`
+### 2. XML-Based
+#### Qwen3 Coder Style
+```xml
+<tool_call>
+<function=get_weather>
+<parameter=location>NYC</parameter>
+</function>
+</tool_call>
+```
+- Models: Qwen3-Coder, Nemotron-Nano
+- Parser: `xml/parser.rs`
+- Attribute-based names and parameters
+### 3. Nested Special Tokens
+#### DeepSeek V3
+```
+<｜tool▁call▁begin｜>function<｜tool▁sep｜>get_weather
+```json
+{"location": "NYC"}
+```
+<｜tool▁call▁end｜>
+```
+- Models: DeepSeek-V3
+- Parser: `deepseek_v3_parser.rs`
+- Multiline with markdown code blocks
+#### DeepSeek V3.1
+```
+<｜tool▁call▁begin｜>get_weather<｜tool▁sep｜>{"location": "NYC"}<｜tool▁call▁end｜>
+```
+- Models: DeepSeek-V3.1
+- Parser: `deepseek_v3_1_parser.rs`
+- Inline JSON
+### 4. DSML (DeepSeek V3.2)
+```xml
+<｜DSML｜function_calls>
+<｜DSML｜invoke name="get_weather">
+<｜DSML｜parameter name="location" string="true">NYC</｜DSML｜parameter>
+</｜DSML｜invoke>
+</｜DSML｜function_calls>
+```
+- Models: DeepSeek-V3.2
+- Parser: `dsml/parser.rs`
+- Explicit parameter types
+### 5. Pythonic
+```python
+[get_weather(location="NYC"), get_time(timezone="EST")]
+```
+- Models: Custom/Experimental
+- Parser: `pythonic/pythonic_parser.rs`
+- Python function call syntax
+### 6. Harmony
+```
+<|channel|>commentary to=functions.get_weather
+<|constrain|>json
+<|message|>{"location": "NYC"}
+```
+- Models: GPT-OSS
+- Parser: `harmony/harmony_parser.rs`
+- OpenAI Harmony protocol
+## Quick Identification Guide
+1. **Look for `tojson` filter** → JSON format
+2. **Look for `<function=` or `<parameter=`** → XML format
+3. **Look for `<｜DSML｜`** → DSML format
+4. **Look for `function(arg=val)`** → Pythonic format
+5. **Look for `<|channel|>commentary`** → Harmony format
+6. **Check start/end markers** → Match to config preset
+## Configuration Keys
+For JSON formats, check these keys in the template:
+- Function name: Usually `name` or `function`
+- Arguments: Usually `arguments` or `parameters`
+- Structure: Array `[{...}]` or single object `{...}`
+## Matching Logic
+1. **Exact match** → Use existing config preset
+2. **Similar markers** → Create new config with same parser
+3. **New format** → Generate new parser implementation
--- a/.gitignore
+++ b/.gitignore
@@ -113,7 +113,8 @@ core
 /.cursor/instructions.md
 /.cursor/instructions.md.bak
 /.cursor
-/.claude
+/.claude/*
+!/.claude/skills/
 /CLAUDE.md
 /CLAUDE.md.bak