Our review
This skill extracts text from PDFs, fills forms, and merges documents using Python scripts.
Strengths
- Automates repetitive PDF tasks
- Built-in validation to ensure extracted data integrity
- Supports PDF form fields
Limitations
- Requires installing pypdf and pdfplumber dependencies
- May not handle complex PDFs with images or tables
Use this skill when you need to batch process PDFs or extract structured data from documents.
Avoid this skill for heavily scanned PDFs or documents requiring OCR.
Security analysis
SafeThe skill is a template containing only static documentation and a harmless Python script that reads PDF metadata. There are no destructive commands, no network calls, and no obfuscation. It poses no execution risk.
No concerns found
Examples
Extract text from the file report.pdf and save it to report.txt.Analyze the structure of the file invoice.pdf, including page count and form fields.Fill the form fields in the file application.pdf with the data from data.json and save the output as filled_application.pdf.Code Skill Template
스크립트를 포함하는 스킬용
디렉토리 구조
my-code-skill/
├── SKILL.md
├── reference.md
└── scripts/
├── analyze.py
├── process.py
└── validate.py
SKILL.md 예시
---
name: processing-pdfs
description: "Extracts text from PDFs, fills forms, merges documents. Use when working with PDF files or document extraction."
allowed-tools:
- Bash
- Read
- Write
---
# PDF Processing
## Quick start
Extract text:
```bash
python scripts/extract_text.py input.pdf > output.txt
Workflow
Copy this checklist:
Progress:
- [ ] Step 1: Analyze PDF structure
- [ ] Step 2: Extract content
- [ ] Step 3: Validate output
Step 1: Analyze PDF
python scripts/analyze.py input.pdf
Output shows page count, form fields, and structure.
Step 2: Extract content
python scripts/extract_text.py input.pdf > output.txt
Step 3: Validate
python scripts/validate.py output.txt
Fix any issues before proceeding.
Scripts reference
| Script | Purpose | |--------|---------| | analyze.py | PDF 구조 분석 | | extract_text.py | 텍스트 추출 | | fill_form.py | 폼 필드 채우기 | | validate.py | 결과 검증 |
For detailed API: See reference.md
Dependencies
pip install pypdf pdfplumber
---
## scripts/analyze.py 예시
```python
#!/usr/bin/env python3
"""
analyze.py - PDF 구조 분석
Usage:
python analyze.py input.pdf
Output:
- Page count
- Form fields
- Text/image ratio
"""
import sys
import json
from pypdf import PdfReader
def main():
if len(sys.argv) != 2:
print("Usage: python analyze.py input.pdf", file=sys.stderr)
sys.exit(1)
pdf_path = sys.argv[1]
try:
reader = PdfReader(pdf_path)
except Exception as e:
print(f"Error reading PDF: {e}", file=sys.stderr)
sys.exit(1)
result = {
"page_count": len(reader.pages),
"form_fields": [],
"has_text": False
}
# Form fields
if reader.get_form_text_fields():
result["form_fields"] = list(reader.get_form_text_fields().keys())
# Check for text
for page in reader.pages:
if page.extract_text().strip():
result["has_text"] = True
break
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()
핵심 포인트
- 워크플로우 체크리스트: 진행 추적
- 피드백 루프: 검증 → 수정 → 재검증
- 스크립트 문서화: Usage, Output 명시
- 에러 직접 처리: Claude에 떠넘기지 않기
- 의존성 명시: pip install 명령 포함
Prompt Engineering
Data & AI
Prompt engineering best practices and templates to maximize AI outputs.
Data Visualization
Data & AI
Generates data visualizations and charts tailored to your data.
RAG Architecture Setup
Data & AI
Setup guide for RAG (Retrieval-Augmented Generation) architectures.