Our review

A tool to scrape Snowflake documentation pages into Markdown files with caching and configurable crawling depth.

Strengths

Simple setup with automatic dependency installation
SQLite cache reduces repeated requests and speeds up updates
Flexible configuration for base path and spider depth
Structured Markdown output with frontmatter metadata

Limitations

Only works on docs.snowflake.com
May download a large number of pages if depth is set high
Cache expiration is fixed at 7 days with no customization option

When to use it

When you need local copies of Snowflake documentation sections for offline reference or LLM context.

When not to use it

For scraping other websites or for real-time updates.

Examples

Scrape entire migration guide

Scrape the Snowflake documentation migration guide with default settings and save it to ./migration-docs

Scrape SQL reference with depth 2

Use doc-scraper to scrape the Snowflake SQL reference section at /en/sql-reference/ with spider depth 2, output to ./sql-docs

Dry-run preview

Run a dry run of doc-scraper for the base path /en/sql-reference/ to see which URLs will be scraped without writing files.

name: doc-scraper description: Generic web scraper for extracting and organizing Snowflake documentation with intelligent caching and configurable spider depth. Scrapes any section of docs.snowflake.com controlled by --base-path.

Snowflake Documentation Scraper

Scrapes docs.snowflake.com sections to Markdown with SQLite caching (7-day expiration).

Usage

First time setup (auto-installs uv and doc-scraper):

python3 .claude/skills/doc-scraper/scripts/doc_scraper.py

Subsequent runs:

doc-scraper --output-dir=./snowflake-docs
doc-scraper --output-dir=./snowflake-docs --base-path="/en/sql-reference/"
doc-scraper --output-dir=./snowflake-docs --spider-depth=2

Command Options

| Option | Default | Description | | ---------------- | ----------------- | ------------------------------------- | | --output-dir | Required | Output directory for scraped docs | | --base-path | /en/migrations/ | URL section to scrape | | --spider-depth | 1 | Link depth: 0=seeds, 1=+links, 2=+2nd | | --limit | None | Cap URLs (for testing) | | --dry-run | - | Preview without writing |

Output

output-dir/
├── SKILL.md              # Auto-generated index
├── scraper_config.yaml   # Editable config (auto-created)
├── .cache/               # SQLite cache (auto-managed)
└── en/migrations/*.md    # Scraped pages with frontmatter

Configuration

Auto-created at {output-dir}/scraper_config.yaml:

rate_limiting:
  max_concurrent_threads: 4
spider:
  max_pages: 1000
  allowed_paths: ["/en/"]
scraped_pages:
  expiration_days: 7

Troubleshooting

| Issue | Solution | | ---------------- | ------------------------------------- | | Too many pages | Lower --spider-depth or edit config | | Missing pages | Increase --spider-depth | | Cache corruption | Delete {output-dir}/.cache/ (rare) |

Snowflake Documentation Scraper

Recommended for

Our review

Strengths

Limitations

Security analysis

Examples

name: doc-scraper description: Generic web scraper for extracting and organizing Snowflake documentation with intelligent caching and configurable spider depth. Scrapes any section of docs.snowflake.com controlled by --base-path.

Snowflake Documentation Scraper

Usage

Command Options

Output

Configuration

Troubleshooting

API Documentation Generator

Technical Writer

Skills Directory - Master Index