PDF Processing

VerifiedSafe

Extracts text from PDFs, fills forms, and merges documents. Use when working with PDF files or document extraction.

Sby Skills Guide Bot
Data & AIIntermediate
206/2/2026
Claude Code
#pdf-processing#document-extraction#form-filling#text-extraction

Recommended for

Our review

Extracts text from PDFs, fills forms, and merges documents with a structured step-by-step workflow.

Strengths

  • Built-in validation step to check output quality
  • Simple, well-documented Python scripts
  • Analyzes PDF structure including form fields

Limitations

  • Requires installing Python dependencies (pypdf, pdfplumber)
  • Does not support image-based PDFs (no OCR)
  • Error handling is basic; may need enhancement for edge cases
When to use it

Use this skill when you need to reliably extract text, fill forms, or merge PDFs as part of a repeatable workflow.

When not to use it

Avoid this skill for a quick one-off PDF read or when the PDF consists entirely of scanned images without embedded text.

Security analysis

Safe
Quality score80/100

This is a template for creating code skills. It contains no executable commands, network calls, or destructive actions. The example scripts are benign PDF analysis tools, and the template itself does not instruct any dangerous operations. No risk present.

No concerns found

Examples

Extract text from a PDF
Extract the text from the PDF file 'report.pdf' and save it to 'output.txt' using the PDF processing skill.
Analyze PDF structure
Analyze the structure of 'form.pdf' - show page count, form fields, and whether it contains text.
Merge multiple PDFs
Merge the PDFs 'doc1.pdf' and 'doc2.pdf' into a single file 'merged.pdf' using the merging script.

Code Skill Template

스크립트를 포함하는 스킬용


디렉토리 구조

my-code-skill/
├── SKILL.md
├── reference.md
└── scripts/
    ├── analyze.py
    ├── process.py
    └── validate.py

SKILL.md 예시

---
name: processing-pdfs
description: "Extracts text from PDFs, fills forms, merges documents. Use when working with PDF files or document extraction."
allowed-tools:
  - Bash
  - Read
  - Write
---

# PDF Processing

## Quick start

Extract text:
```bash
python scripts/extract_text.py input.pdf > output.txt

Workflow

Copy this checklist:

Progress:
- [ ] Step 1: Analyze PDF structure
- [ ] Step 2: Extract content
- [ ] Step 3: Validate output

Step 1: Analyze PDF

python scripts/analyze.py input.pdf

Output shows page count, form fields, and structure.

Step 2: Extract content

python scripts/extract_text.py input.pdf > output.txt

Step 3: Validate

python scripts/validate.py output.txt

Fix any issues before proceeding.

Scripts reference

| Script | Purpose | |--------|---------| | analyze.py | PDF 구조 분석 | | extract_text.py | 텍스트 추출 | | fill_form.py | 폼 필드 채우기 | | validate.py | 결과 검증 |

For detailed API: See reference.md

Dependencies

pip install pypdf pdfplumber

---

## scripts/analyze.py 예시

```python
#!/usr/bin/env python3
"""
analyze.py - PDF 구조 분석

Usage:
    python analyze.py input.pdf

Output:
    - Page count
    - Form fields
    - Text/image ratio
"""

import sys
import json
from pypdf import PdfReader

def main():
    if len(sys.argv) != 2:
        print("Usage: python analyze.py input.pdf", file=sys.stderr)
        sys.exit(1)
    
    pdf_path = sys.argv[1]
    
    try:
        reader = PdfReader(pdf_path)
    except Exception as e:
        print(f"Error reading PDF: {e}", file=sys.stderr)
        sys.exit(1)
    
    result = {
        "page_count": len(reader.pages),
        "form_fields": [],
        "has_text": False
    }
    
    # Form fields
    if reader.get_form_text_fields():
        result["form_fields"] = list(reader.get_form_text_fields().keys())
    
    # Check for text
    for page in reader.pages:
        if page.extract_text().strip():
            result["has_text"] = True
            break
    
    print(json.dumps(result, indent=2))

if __name__ == "__main__":
    main()

핵심 포인트

  1. 워크플로우 체크리스트: 진행 추적
  2. 피드백 루프: 검증 → 수정 → 재검증
  3. 스크립트 문서화: Usage, Output 명시
  4. 에러 직접 처리: Claude에 떠넘기지 않기
  5. 의존성 명시: pip install 명령 포함
Related skills