Structured debugging with session persistence

VerifiedCaution

Structured debugging workflow that uses subagents for isolated investigation, preventing context pollution. It persists sessions, gathers symptoms systematically, and manages checkpoints to resume long debugging sessions efficiently. Helps when debugging complex issues that require deep, iterative exploration without losing progress.

Sby Skills Guide Bot
DevelopmentAdvanced
1206/2/2026
Claude Code
#debugging#scientific-method#workflow#subagent#session-persistence

Recommended for

Our review

A structured debugging workflow using the scientific method, with isolated sub-agents to preserve context and track investigations via persistent sessions.

Strengths

  • Isolates investigation context via dedicated sub-agents, avoiding pollution of the main context.
  • Rigorous methodology for gathering symptoms, forming hypotheses, and verifying.
  • Persistent debug sessions with checkpoint resumption and full history.
  • Complete traceability with session files and root cause reporting.

Limitations

  • Requires initial setup and discipline to follow the steps closely.
  • Can be overkill for simple or trivial bugs.
  • Relies on sub-agents' ability to recover prior context from files.
When to use it

Best for complex bugs requiring deep, multi-step investigation where context loss is a risk.

When not to use it

Not suitable for quick fixes or obvious problems where informal debugging is sufficient.

Security analysis

Caution
Quality score95/100

The skill uses Bash for debugging, which could execute arbitrary commands, but the workflow is well-structured and does not instruct destructive or exfiltrating actions. The spawning of subagents is controlled, and no malicious patterns are present. Overall, it is a legitimate debugging tool, but Bash capability warrants caution.

No concerns found

Examples

Debug a server crash
My Node.js server crashes intermittently with 'FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory'. I'm running Node 18 on Ubuntu. It usually happens after a few hours under load. Can you debug this?
Investigate a failing CI build step
The 'lint' step in our CI pipeline fails randomly with no clear error message. It succeeds locally. The build uses GitHub Actions and runs on ubuntu-latest. What's going wrong?
Resume a previous debugging session
Resume my last debugging session for the authentication timeout issue.

name: gsd-debug description: Structured debugging workflow with session persistence and investigation tracking allowed-tools: Task, Read, Edit, Bash argument-hint: [issue]

<objective> Debug issues using scientific method with subagent isolation.

Orchestrator role: Gather symptoms, spawn gsd-debugger agent, handle checkpoints, spawn continuations.

Why subagent: Investigation burns context fast (reading files, forming hypotheses, testing). Fresh 200k context per investigation. Main context stays lean for user interaction. </objective>

<context> User's issue: $ARGUMENTS

Check for active sessions:

ls .planning/debug/*.md 2>/dev/null | grep -v resolved | head -5
</context> <process>

1. Check Active Sessions

If active sessions exist AND no $ARGUMENTS:

  • List sessions with status, hypothesis, next action
  • User picks number to resume OR describes new issue

If $ARGUMENTS provided OR user describes new issue:

  • Continue to symptom gathering

2. Gather Symptoms (if new issue)

Use AskUserQuestion for each:

  1. Expected behavior - What should happen?
  2. Actual behavior - What happens instead?
  3. Error messages - Any errors? (paste or describe)
  4. Timeline - When did this start? Ever worked?
  5. Reproduction - How do you trigger it?

After all gathered, confirm ready to investigate.

3. Spawn gsd-debugger Agent

Fill prompt and spawn:

<objective>
Investigate issue: {slug}

**Summary:** {trigger}
</objective>

<symptoms>
expected: {expected}
actual: {actual}
errors: {errors}
reproduction: {reproduction}
timeline: {timeline}
</symptoms>

<mode>
symptoms_prefilled: true
goal: find_and_fix
</mode>

<debug_file>
Create: .planning/debug/{slug}.md
</debug_file>
Task(
  prompt=filled_prompt,
  subagent_type="gsd-debugger",
  description="Debug {slug}"
)

4. Handle Agent Return

If ## ROOT CAUSE FOUND:

  • Display root cause and evidence summary
  • Offer options:
    • "Fix now" - spawn fix subagent
    • "Plan fix" - suggest {{COMMAND_PREFIX}}plan-phase --gaps
    • "Manual fix" - done

If ## CHECKPOINT REACHED:

  • Present checkpoint details to user
  • Get user response
  • Spawn continuation agent (see step 5)

If ## INVESTIGATION INCONCLUSIVE:

  • Show what was checked and eliminated
  • Offer options:
    • "Continue investigating" - spawn new agent with additional context
    • "Manual investigation" - done
    • "Add more context" - gather more symptoms, spawn again

5. Spawn Continuation Agent (After Checkpoint)

When user responds to checkpoint, spawn fresh agent:

<objective>
Continue debugging {slug}. Evidence is in the debug file.
</objective>

<prior_state>
Debug file: @.planning/debug/{slug}.md
</prior_state>

<checkpoint_response>
**Type:** {checkpoint_type}
**Response:** {user_response}
</checkpoint_response>

<mode>
goal: find_and_fix
</mode>
Task(
  prompt=continuation_prompt,
  subagent_type="gsd-debugger",
  description="Continue debug {slug}"
)
</process>

<success_criteria>

  • [ ] Active sessions checked
  • [ ] Symptoms gathered (if new)
  • [ ] gsd-debugger spawned with context
  • [ ] Checkpoints handled correctly
  • [ ] Root cause confirmed before fixing </success_criteria>
Related skills