Our review
Search and extract content from PDF files without loading entire documents into context.
Strengths
- Fast searching with pdfgrep, supporting case-insensitive and regex patterns.
- Page-range extraction with pdftotext to minimize context usage.
- Ability to preserve layout for tables and structured text.
Limitations
- Does not work with scanned (image-based) PDFs.
- Requires prior installation of pdfgrep and poppler-utils.
- Context lines are limited to a few before/after the match.
When you need to quickly locate and extract specific content from a large PDF without reading the entire file.
When the PDF is image-based or requires precise layout extraction (use OCR or dedicated tools instead).
Security analysis
SafeThe skill uses read-only commands (pdfgrep, pdftotext) to search and extract text from PDFs. No destructive, exfiltrating, or obfuscated actions are instructed.
No concerns found
Examples
Search for 'authentication' in the manual.pdf file and show me the page numbers and a few lines of context around each match.Extract pages 10 to 15 from large-report.pdf and output the text.Tell me how many pages are in the document guide.pdf.name: pdf-tools description: Search and extract content from PDF files. Use when searching PDFs, finding text in documents, or extracting specific pages without reading the entire file. allowed-tools: Bash, Read, Glob
PDF Tools
Search and extract content from PDFs without loading entire files into context.
Installation
# macOS
brew install pdfgrep poppler
# Ubuntu/Debian
sudo apt install pdfgrep poppler-utils
Quick Reference
| Task | Command |
|------|---------|
| Search | pdfgrep "term" file.pdf |
| Search with page numbers | pdfgrep -n "term" file.pdf |
| Search with context | pdfgrep -n -C 2 "term" file.pdf |
| Get page count | pdfinfo file.pdf \| grep Pages |
| Extract pages 5-10 | pdftotext -f 5 -l 10 file.pdf - |
Core Workflow
Step 1: Search - Find where content lives
pdfgrep -n "authentication" large-manual.pdf
# Output: 42: User authentication requires...
# 45: Authentication tokens expire...
Step 2: Extract - Get just those pages
pdftotext -f 41 -l 46 large-manual.pdf -
Search Commands
# Basic search
pdfgrep "search term" document.pdf
# Case-insensitive
pdfgrep -i "search term" document.pdf
# With page numbers
pdfgrep -n "search term" document.pdf
# With context (2 lines before/after)
pdfgrep -n -C 2 "search term" document.pdf
# Count occurrences
pdfgrep -c "search term" document.pdf
# Search all PDFs in directory
pdfgrep -r "term" /path/to/pdfs/
Extract Commands
# Extract specific page range
pdftotext -f 10 -l 15 document.pdf -
# Extract single page
pdftotext -f 42 -l 42 document.pdf -
# Preserve layout (for tables)
pdftotext -layout -f 10 -l 10 document.pdf -
# Extract and limit output
pdftotext -f 10 -l 15 document.pdf - | head -50
Metadata
# Get page count
pdfinfo document.pdf | grep Pages
# Full metadata
pdfinfo document.pdf
Troubleshooting
Empty output from pdftotext: PDF is image-based (scanned). These tools work with text-based PDFs only.
pdfgrep missing matches: Try case-insensitive (-i). Check if PDF has selectable text.
API Documentation Generator
Documentation
Automatically generates OpenAPI/Swagger API documentation.
Technical Writer
Documentation
Writes clear technical documentation following top style guides.
Typed Documentation Forms System
Documentation
Add typed comments, documentation, todos, and metadata to Scheme code using `(doc ...)` forms. Doc annotations are authoritative for type inference, extracted by search commands (lf-todo, lf-types), and integrated with the type checker and LSP. Useful for annotating functions, marking deprecations, or tracking localized improvements alongside the code.