PDF Tools

VerifiedSafe

Search and extract content from PDF files without loading entire documents. Use pdfgrep for text searches and pdftotext to extract specific pages, page count, or metadata. Helps when you need to find information in large PDFs or extract relevant sections without reading the whole file.

Sby Skills Guide Bot
DocumentationBeginner
406/2/2026
Claude CodeCursorWindsurfCopilotCodex
#pdf#search#extraction#pdfgrep#poppler

Recommended for

Our review

Search and extract content from PDF files without loading entire documents into context.

Strengths

  • Fast searching with pdfgrep, supporting case-insensitive and regex patterns.
  • Page-range extraction with pdftotext to minimize context usage.
  • Ability to preserve layout for tables and structured text.

Limitations

  • Does not work with scanned (image-based) PDFs.
  • Requires prior installation of pdfgrep and poppler-utils.
  • Context lines are limited to a few before/after the match.
When to use it

When you need to quickly locate and extract specific content from a large PDF without reading the entire file.

When not to use it

When the PDF is image-based or requires precise layout extraction (use OCR or dedicated tools instead).

Security analysis

Safe
Quality score95/100

The skill uses read-only commands (pdfgrep, pdftotext) to search and extract text from PDFs. No destructive, exfiltrating, or obfuscated actions are instructed.

No concerns found

Examples

Search for a term in a PDF
Search for 'authentication' in the manual.pdf file and show me the page numbers and a few lines of context around each match.
Extract pages from a PDF as text
Extract pages 10 to 15 from large-report.pdf and output the text.
Get page count of a PDF
Tell me how many pages are in the document guide.pdf.

name: pdf-tools description: Search and extract content from PDF files. Use when searching PDFs, finding text in documents, or extracting specific pages without reading the entire file. allowed-tools: Bash, Read, Glob

PDF Tools

Search and extract content from PDFs without loading entire files into context.

Installation

# macOS
brew install pdfgrep poppler

# Ubuntu/Debian
sudo apt install pdfgrep poppler-utils

Quick Reference

| Task | Command | |------|---------| | Search | pdfgrep "term" file.pdf | | Search with page numbers | pdfgrep -n "term" file.pdf | | Search with context | pdfgrep -n -C 2 "term" file.pdf | | Get page count | pdfinfo file.pdf \| grep Pages | | Extract pages 5-10 | pdftotext -f 5 -l 10 file.pdf - |


Core Workflow

Step 1: Search - Find where content lives

pdfgrep -n "authentication" large-manual.pdf
# Output: 42: User authentication requires...
#         45: Authentication tokens expire...

Step 2: Extract - Get just those pages

pdftotext -f 41 -l 46 large-manual.pdf -

Search Commands

# Basic search
pdfgrep "search term" document.pdf

# Case-insensitive
pdfgrep -i "search term" document.pdf

# With page numbers
pdfgrep -n "search term" document.pdf

# With context (2 lines before/after)
pdfgrep -n -C 2 "search term" document.pdf

# Count occurrences
pdfgrep -c "search term" document.pdf

# Search all PDFs in directory
pdfgrep -r "term" /path/to/pdfs/

Extract Commands

# Extract specific page range
pdftotext -f 10 -l 15 document.pdf -

# Extract single page
pdftotext -f 42 -l 42 document.pdf -

# Preserve layout (for tables)
pdftotext -layout -f 10 -l 10 document.pdf -

# Extract and limit output
pdftotext -f 10 -l 15 document.pdf - | head -50

Metadata

# Get page count
pdfinfo document.pdf | grep Pages

# Full metadata
pdfinfo document.pdf

Troubleshooting

Empty output from pdftotext: PDF is image-based (scanned). These tools work with text-based PDFs only.

pdfgrep missing matches: Try case-insensitive (-i). Check if PDF has selectable text.

Related skills