Operational Runbook Documentation

VerifiedSafe

Create operational runbooks for deployments, database operations, incident response, and maintenance procedures. Practical how-to guides following the Diátaxis framework.

Sby Skills Guide Bot
DevOpsIntermediate
306/2/2026
Claude CodeCursorWindsurfCopilotCodex
#runbook#documentation#operations#deployment#ci-cd

Recommended for

Our review

Creates and updates operational runbooks for deployment, database, maintenance, and incident response procedures.

Strengths

  • Uses a structured template and quality checklist
  • Includes risk levels and rollback steps
  • Integrates strict documentation principles
  • Produces copy-paste ready commands

Limitations

  • Requires documentation principles to be pre-defined
  • Asks user confirmation for new procedures
  • High-risk procedures require team approval
When to use it

Use to document a repeatable operational procedure with clear steps and failure recovery.

When not to use it

Avoid for ad-hoc one-off tasks or when the process is not yet implemented in code.

Security analysis

Safe
Quality score88/100

This skill is purely for documentation purposes. It does not instruct any tool execution, system commands, or data handling. The content is about creating runbooks, which is a passive, informative task with no inherent security risk.

No concerns found

Examples

Deployment Runbook
Create a runbook for deploying to production, including steps, rollback, and failure recovery.
Database Migration Runbook
Document the database migration process with commands, expected outputs, and rollback plan.
Incident Response Runbook
Write an incident response runbook for a service outage, including troubleshooting and escalation steps.

name: document-runbook description: Document CI/CD, deployment, and operational procedures. Use when creating runbooks, documenting deployment processes, or writing operational guides.

Runbook Documentation Skill

Purpose

Create operational runbooks for deployment, database operations, incident response, and maintenance procedures. These are "How-To" documents in the Diátaxis framework, specifically for operations tasks.

Smart Interaction

ASK the User When:

  • Creating new runbook: Confirm process name and risk level
  • Deleting runbook: Always confirm before deletion
  • High-risk procedures: Confirm rollback steps are adequate

PROCEED Autonomously When:

  • Updating existing runbook: Add new steps, update commands
  • Adding troubleshooting: Enhance with new failure scenarios
  • Fixing commands: Correct outdated or broken commands
  • Adding verification steps: Improve procedure completeness

Documentation Principles (CRITICAL)

Before writing ANY documentation, review ../DOCUMENTATION_PRINCIPLES.md for:

  1. Ground Truth Only - Document what exists in code, no speculation
  2. Writing Tone - Clear and educational without audience labels
  3. Code Examples - Real files with paths and line numbers
  4. Performance Docs - Techniques + measurement methods, NOT estimated timings
  5. What NOT to include - No troubleshooting, future work, or meta-commentary
  6. Diagrams - Use when they clarify technicals, not for decoration

These principles override any template suggestions that conflict with them.

Note: Runbooks are the APPROPRIATE place for troubleshooting content (unlike feature docs).

Instructions

When documenting operational procedures:

  1. Identify the process (deploy, rollback, migration, etc.)
  2. Use the runbook template at templates/runbook.md
  3. Include actual commands that can be copy-pasted
  4. Document failure scenarios and recovery steps
  5. Output to /docs/operations/[process-name].md

Template

Use the template at: .claude/skills/document-runbook/templates/runbook.md

Runbook Categories

Organize runbooks by category:

docs/operations/
├── index.md                    # Operations overview
├── deployment/
│   ├── deploy-to-production.md
│   ├── deploy-to-staging.md
│   └── rollback-deployment.md
├── database/
│   ├── database-migration.md
│   ├── backup-restore.md
│   └── seed-data.md
├── maintenance/
│   ├── dependency-updates.md
│   └── log-rotation.md
└── incident-response/
    ├── service-outage.md
    └── data-corruption.md

Command Standards

  • All commands must be copy-paste ready
  • Use environment variables for secrets: $DATABASE_URL
  • Include --dry-run options where available
  • Show both successful and error outputs

Risk Levels

| Level | Definition | Review Required | | ------ | ------------------------------------ | --------------- | | Low | No data loss risk, easily reversible | None | | Medium | Potential service disruption | Team lead | | High | Data loss risk, hard to reverse | Team approval |

Output Location

| Category | Output Path | | ----------- | ---------------------------------------------- | | Deployment | /docs/operations/deployment/[name].md | | Database | /docs/operations/database/[name].md | | Maintenance | /docs/operations/maintenance/[name].md | | Incident | /docs/operations/incident-response/[name].md |

Quality Checklist

Before completing:

  • [ ] All commands are copy-paste ready
  • [ ] Expected outputs documented
  • [ ] Failure scenarios covered
  • [ ] Rollback procedure included
  • [ ] Troubleshooting table complete
  • [ ] Prerequisites clearly listed
  • [ ] Time estimate provided
  • [ ] Risk level assessed
  • [ ] Emergency contacts included (for high-risk)

Examples

Creating New Runbooks (Will Ask User)

  • "Create a deployment runbook" → Ask: Risk level? Environment?
  • "Document the database migration process" → Confirm category and scope

Updating Existing Runbooks (Autonomous)

  • "Add new step to deployment runbook" → Updates existing doc
  • "Fix the database restore command" → Corrects command
  • "Add troubleshooting for timeout errors" → Adds to troubleshooting table

By Category

  • "Document production deployment" → /docs/operations/deployment/deploy-to-production.md
  • "Create database backup runbook" → /docs/operations/database/backup-restore.md
  • "Write incident response for outages" → /docs/operations/incident-response/service-outage.md
Related skills