Snowflake Documentation Scraper

VerifiedSafe

Scrapes sections of docs.snowflake.com into Markdown files with SQLite caching (7-day expiration). Configurable via command-line options for base path, spider depth, and output directory. Useful for obtaining local copies of Snowflake documentation for offline reference or automated processing.

Sby Skills Guide Bot
DocumentationIntermediate
2106/2/2026
Claude Code
#snowflake#documentation-scraper#web-scraping#markdown-conversion

Recommended for

Our review

A tool to scrape Snowflake documentation pages into Markdown files with caching and configurable crawling depth.

Strengths

  • Simple setup with automatic dependency installation
  • SQLite cache reduces repeated requests and speeds up updates
  • Flexible configuration for base path and spider depth
  • Structured Markdown output with frontmatter metadata

Limitations

  • Only works on docs.snowflake.com
  • May download a large number of pages if depth is set high
  • Cache expiration is fixed at 7 days with no customization option
When to use it

When you need local copies of Snowflake documentation sections for offline reference or LLM context.

When not to use it

For scraping other websites or for real-time updates.

Security analysis

Safe
Quality score85/100

The skill is a documentation scraper that accesses a trusted domain (docs.snowflake.com) and writes to a local directory. It does not execute downloaded content, expose secrets, or perform destructive actions. The first-time setup uses a Python script that installs a tool via standard package managers, which is a common pattern and not inherently risky.

No concerns found

Examples

Scrape entire migration guide
Scrape the Snowflake documentation migration guide with default settings and save it to ./migration-docs
Scrape SQL reference with depth 2
Use doc-scraper to scrape the Snowflake SQL reference section at /en/sql-reference/ with spider depth 2, output to ./sql-docs
Dry-run preview
Run a dry run of doc-scraper for the base path /en/sql-reference/ to see which URLs will be scraped without writing files.

name: doc-scraper description: Generic web scraper for extracting and organizing Snowflake documentation with intelligent caching and configurable spider depth. Scrapes any section of docs.snowflake.com controlled by --base-path.

Snowflake Documentation Scraper

Scrapes docs.snowflake.com sections to Markdown with SQLite caching (7-day expiration).

Usage

First time setup (auto-installs uv and doc-scraper):

python3 .claude/skills/doc-scraper/scripts/doc_scraper.py

Subsequent runs:

doc-scraper --output-dir=./snowflake-docs
doc-scraper --output-dir=./snowflake-docs --base-path="/en/sql-reference/"
doc-scraper --output-dir=./snowflake-docs --spider-depth=2

Command Options

| Option | Default | Description | | ---------------- | ----------------- | ------------------------------------- | | --output-dir | Required | Output directory for scraped docs | | --base-path | /en/migrations/ | URL section to scrape | | --spider-depth | 1 | Link depth: 0=seeds, 1=+links, 2=+2nd | | --limit | None | Cap URLs (for testing) | | --dry-run | - | Preview without writing |

Output

output-dir/
├── SKILL.md              # Auto-generated index
├── scraper_config.yaml   # Editable config (auto-created)
├── .cache/               # SQLite cache (auto-managed)
└── en/migrations/*.md    # Scraped pages with frontmatter

Configuration

Auto-created at {output-dir}/scraper_config.yaml:

rate_limiting:
  max_concurrent_threads: 4
spider:
  max_pages: 1000
  allowed_paths: ["/en/"]
scraped_pages:
  expiration_days: 7

Troubleshooting

| Issue | Solution | | ---------------- | ------------------------------------- | | Too many pages | Lower --spider-depth or edit config | | Missing pages | Increase --spider-depth | | Cache corruption | Delete {output-dir}/.cache/ (rare) |

Related skills