Our review

Search and extract content from PDF files without loading entire documents into context.

Strengths

Fast searching with pdfgrep, supporting case-insensitive and regex patterns.
Page-range extraction with pdftotext to minimize context usage.
Ability to preserve layout for tables and structured text.

Limitations

Does not work with scanned (image-based) PDFs.
Requires prior installation of pdfgrep and poppler-utils.
Context lines are limited to a few before/after the match.

When to use it

When you need to quickly locate and extract specific content from a large PDF without reading the entire file.

When not to use it

When the PDF is image-based or requires precise layout extraction (use OCR or dedicated tools instead).

Examples

Search for a term in a PDF

Search for 'authentication' in the manual.pdf file and show me the page numbers and a few lines of context around each match.

Extract pages from a PDF as text

Extract pages 10 to 15 from large-report.pdf and output the text.

Get page count of a PDF

Tell me how many pages are in the document guide.pdf.

name: pdf-tools description: Search and extract content from PDF files. Use when searching PDFs, finding text in documents, or extracting specific pages without reading the entire file. allowed-tools: Bash, Read, Glob

PDF Tools

Search and extract content from PDFs without loading entire files into context.

Installation

# macOS
brew install pdfgrep poppler

# Ubuntu/Debian
sudo apt install pdfgrep poppler-utils

Quick Reference

| Task | Command | |------|---------| | Search | pdfgrep "term" file.pdf | | Search with page numbers | pdfgrep -n "term" file.pdf | | Search with context | pdfgrep -n -C 2 "term" file.pdf | | Get page count | pdfinfo file.pdf \| grep Pages | | Extract pages 5-10 | pdftotext -f 5 -l 10 file.pdf - |

Core Workflow

Step 1: Search - Find where content lives

pdfgrep -n "authentication" large-manual.pdf
# Output: 42: User authentication requires...
#         45: Authentication tokens expire...

Step 2: Extract - Get just those pages

pdftotext -f 41 -l 46 large-manual.pdf -

Search Commands

# Basic search
pdfgrep "search term" document.pdf

# Case-insensitive
pdfgrep -i "search term" document.pdf

# With page numbers
pdfgrep -n "search term" document.pdf

# With context (2 lines before/after)
pdfgrep -n -C 2 "search term" document.pdf

# Count occurrences
pdfgrep -c "search term" document.pdf

# Search all PDFs in directory
pdfgrep -r "term" /path/to/pdfs/

Extract Commands

# Extract specific page range
pdftotext -f 10 -l 15 document.pdf -

# Extract single page
pdftotext -f 42 -l 42 document.pdf -

# Preserve layout (for tables)
pdftotext -layout -f 10 -l 10 document.pdf -

# Extract and limit output
pdftotext -f 10 -l 15 document.pdf - | head -50

Metadata

# Get page count
pdfinfo document.pdf | grep Pages

# Full metadata
pdfinfo document.pdf

Troubleshooting

Empty output from pdftotext: PDF is image-based (scanned). These tools work with text-based PDFs only.

pdfgrep missing matches: Try case-insensitive (-i). Check if PDF has selectable text.

PDF Tools

Recommended for

Our review

Strengths

Limitations

Security analysis

Examples

name: pdf-tools description: Search and extract content from PDF files. Use when searching PDFs, finding text in documents, or extracting specific pages without reading the entire file. allowed-tools: Bash, Read, Glob

PDF Tools

Installation

Quick Reference

Core Workflow

Search Commands

Extract Commands

Metadata

Troubleshooting

API Documentation Generator

Technical Writer

Skills Directory - Master Index