Connecter des données
Assistant guidé pour connecter un nouveau dataset. Guide l'utilisateur à travers la sélection du type de connexion, la configuration des identifiants, la validation et le profilage du schéma.
Skill: Connect Data
Purpose
Guided wizard to connect a new dataset. Walks the user through selecting a connection type, configuring credentials, validating the connection, profiling the schema, and setting up the knowledge brain.
When to Use
- User says
/connect-dataor "connect my database" or "add a new dataset" - First-run welcome suggests connecting data
- After
/switch-datasetwhen the target dataset doesn't exist yet
Invocation
/connect-data — start the connection wizard
/connect-data type=postgres — skip type selection
Instructions
Step 1: Choose Connection Type
Present options:
- CSV files — "I have CSV files in a local directory"
- DuckDB — "I have a local DuckDB database file"
- MotherDuck — "I have a MotherDuck cloud database"
- PostgreSQL — "I have a PostgreSQL database"
- BigQuery — "I have a Google BigQuery dataset"
- Snowflake — "I have a Snowflake warehouse"
Step 2: Collect Connection Details
For CSV:
- Ask: "What's the path to your CSV directory? (relative to this repo)"
- Verify the directory exists and contains .csv files
- List found files and ask to confirm
For DuckDB:
- Ask: "Path to your .duckdb file?"
- Verify file exists
- Test connection with
SELECT 1
For MotherDuck:
- Ask: "Database name and schema?"
- Note: "MotherDuck connects via MCP. Make sure your token is configured."
For PostgreSQL / BigQuery / Snowflake:
- Copy the appropriate template from
connection_templates/ - Ask user to fill in required fields
- IMPORTANT: Never ask for or store passwords directly. Guide the user
to use environment variables (e.g.,
$PG_PASSWORD).
Step 3: Create Dataset Brain
- Generate a dataset_id from the display name (lowercase, hyphens)
- Create
.knowledge/datasets/{id}/directory - Write
manifest.yamlfrom the connection template + user inputs - Create empty
quirks.mdwith section headers - Create empty
metrics/index.yaml
Step 4: Test Connection
Use ConnectionManager from helpers/connection_manager.py:
- Instantiate with the new config
- Call
test_connection() - If fails: show error, offer to retry or edit config
- If passes: proceed
Step 5: Profile Schema
- Call
list_tables()to enumerate tables - For each table: get column names and types via
get_table_schema() - Generate
schema.mdusingschema_to_markdown()fromhelpers/data_helpers.py - Write to
.knowledge/datasets/{id}/schema.md - Offer to run full data profiling: "Want me to deep-profile this dataset?"
Step 6: Set Active
- Update
.knowledge/active.yamlto point to the new dataset - Confirm: "Connected! {display_name} is now your active dataset."
- Show: table count, estimated row count, date range (if detected)
- Suggest next steps:
/exploreto browse,/metricsto define metrics, or just ask a question
Rules
- Never store credentials in plain text in manifest files
- Always test the connection before declaring success
- Always generate a schema.md — it's required for analysis
- Create the full .knowledge/datasets/{id}/ tree even if profiling fails
- If the user already has this dataset, ask before overwriting
Edge Cases
- Directory doesn't exist: Offer to create it
- No CSV files found: Check for other formats (.parquet, .json)
- Connection fails repeatedly: Suggest checking credentials, firewall, VPN
- Schema too large (>100 tables): Profile only, skip per-table details
- Dataset name collision: Append a number (e.g., "mydata-2")
Skills similaires
Ingénierie de Prompts
Bonnes pratiques et templates de prompt engineering pour maximiser les résultats IA.
Visualisation de Données
Génère des visualisations de données et graphiques adaptés à vos données.
Architecture RAG
Guide de configuration d'architectures RAG (Retrieval-Augmented Generation).