Automatisation de navigateur avec agent-browser

VérifiéPrudence

CLI pour automatiser l'interaction avec les sites web : navigation, remplissage de formulaires, extraction de données, tests d'applications web. Idéal pour les tâches de scraping, login automatisé et automatisation de navigateur.

Spar Skills Guide Bot
DeveloppementIntermédiaire
5002/06/2026
Claude CodeCursorWindsurfCopilotCodex
#browser-automation#cli#web-testing#data-extraction#form-filling

Recommandé pour

Notre avis

Automatise les interactions avec le navigateur web via une interface en ligne de commande, permettant de naviguer, remplir des formulaires, cliquer, extraire des données et prendre des captures d'écran.

Points forts

  • Interface simple utilisant des références d'éléments (@e1, @e2) pour interagir avec la page.
  • Prise en charge de sessions parallèles et de persistance d'état (authentification).
  • Commandes de capture d'écran, PDF, et extraction de texte en JSON.

Limites

  • Nécessite une compréhension des références d'éléments et de l'ordre des opérations.
  • Ne gère pas nativement les sites avec chargement infini ou WebSockets complexes.
  • Dépend de la stabilité des sélecteurs générés automatiquement.
Quand l'utiliser

Utilisez cette compétence pour automatiser des tâches répétitives sur le web, comme la soumission de formulaires, l'extraction de données ou les tests d'interface utilisateur.

Quand l'éviter

Évitez de l'utiliser pour des interactions nécessitant une logique JavaScript complexe ou des environnements sans navigateur graphique.

Analyse de sécurité

Prudence
Score qualité92/100

The skill provides instructions for a browser automation CLI that can interact with arbitrary web pages, handle credentials, and persist session data. It uses Bash(agent-browser:*) which grants full access to the browser tool. This is a legitimate capability but carries risk if the AI agent is tricked into visiting malicious sites, leaking sensitive data, or performing unintended actions.

Points d'attention
  • Uses powerful agent-browser tool that can navigate to any URL, fill forms, capture screenshots, and save authentication state. While not inherently malicious, misuse could lead to data exposure or unintended interactions with external sites.

Exemples

Automate form submission
Go to https://example.com/signup, fill in the form with name 'Jane Doe', email 'jane@example.com', select 'California' from the dropdown, check the terms checkbox, and submit. Take a screenshot after submission.
Login and save session
Log into https://app.example.com/login with username 'user' and password 'pass', wait for the dashboard to load, save the authentication state to 'auth.json', then reload it in a new session and navigate to the dashboard.
Scrape product data
Open https://example.com/products, get the text of the first product element, then take a snapshot with JSON output to extract all product names and prices.

name: agent-browser description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. allowed-tools: Bash(agent-browser:*)

Browser Automation with agent-browser

Core Workflow

Every browser automation follows this pattern:

  1. Navigate: agent-browser open <url>
  2. Snapshot: agent-browser snapshot -i (get element refs like @e1, @e2)
  3. Interact: Use refs to click, fill, select
  4. Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result

Essential Commands

# Navigation
agent-browser open <url>              # Navigate (aliases: goto, navigate)
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click element
agent-browser fill @e2 "text"         # Clear and type text
agent-browser type @e2 "text"         # Type without clearing
agent-browser select @e1 "option"     # Select dropdown option
agent-browser check @e1               # Check checkbox
agent-browser press Enter             # Press key
agent-browser scroll down 500         # Scroll page

# Get information
agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

# Wait
agent-browser wait @e1                # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page"    # Wait for URL pattern
agent-browser wait 2000               # Wait milliseconds

# Capture
agent-browser screenshot              # Screenshot to temp dir
agent-browser screenshot --full       # Full page screenshot
agent-browser pdf output.pdf          # Save as PDF

Common Patterns

Form Submission

agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle

Authentication with State Persistence

# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard

Data Extraction

agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5           # Get specific element text
agent-browser get text body > page.txt  # Get all page text

# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json

Parallel Sessions

agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com

agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i

agent-browser session list

Visual Browser (Debugging)

agent-browser --headed open https://example.com
agent-browser highlight @e1          # Highlight element
agent-browser record start demo.webm # Record session

iOS Simulator (Mobile Safari)

# List available iOS simulators
agent-browser device list

# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1          # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up         # Mobile-specific gesture

# Take screenshot
agent-browser -p ios screenshot mobile.png

# Close session (shuts down simulator)
agent-browser -p ios close

Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)

Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.

Ref Lifecycle (Important)

Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:

  • Clicking links or buttons that navigate
  • Form submissions
  • Dynamic content loading (dropdowns, modals)
agent-browser click @e5              # Navigates to new page
agent-browser snapshot -i            # MUST re-snapshot
agent-browser click @e1              # Use new refs

Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click

Deep-Dive Documentation

| Reference | When to Use | | -------------------------------------------------------------------- | --------------------------------------------------------- | | references/commands.md | Full command reference with all options | | references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting | | references/session-management.md | Parallel sessions, state persistence, concurrent scraping | | references/authentication.md | Login flows, OAuth, 2FA handling, state reuse | | references/video-recording.md | Recording workflows for debugging and documentation | | references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |

Ready-to-Use Templates

| Template | Description | | ------------------------------------------------------------------------ | ----------------------------------- | | templates/form-automation.sh | Form filling with validation | | templates/authenticated-session.sh | Login once, reuse state | | templates/capture-workflow.sh | Content extraction with screenshots |

./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output
Skills similaires