Notre avis
Cette compétence examine et évalue les conversations d'agents pour identifier des améliorations des outils, instructions et documentations, en se concentrant sur les changements système plutôt que sur l'exactitude des réponses.
Points forts
- Cadre d'évaluation structuré couvrant la qualité des instructions, l'adéquation des outils, l'efficacité de la trajectoire et la pertinence.
- Produit des suggestions concrètes et actionnables pour les améliorations système.
- Génère des fichiers de trajectoire détaillés pour une analyse approfondie.
- Se concentre sur ce que les développeurs système peuvent modifier, pas sur la connaissance du modèle.
Limites
- Nécessite l'accès aux scripts de génération de trajectoire spécifiques.
- L'évaluation peut être longue pour des conversations complexes.
- Suppose une séparation claire entre l'intention de l'utilisateur et les capacités du système.
Utilisez cette compétence lorsque vous devez analyser systématiquement une conversation d'agent pour trouver des moyens d'améliorer ses instructions, outils ou documentation.
Ne l'utilisez pas si vous avez seulement besoin d'évaluer la correction d'une réponse unique ou si vous n'avez pas accès aux outils de génération de trajectoire.
Analyse de sécurité
SûrThe skill runs a fixed Python script to generate trajectory files. It does not execute arbitrary user input, exfiltrate data, or perform destructive actions. The command is idempotent and limited to generating analysis artifacts.
Aucun point d'attention détecté
Exemples
Please review chat ID abc123 and identify any issues with the instructions or tools that could be improved to help the agent perform better.Analyze the trajectory of my last conversation with the agent and suggest improvements to the system prompt or tool descriptions.Evaluate the agent's performance in chat xyz789 and identify any tool gaps that caused inefficiency or unnecessary steps.name: analyse-trajectory description: > Review and evaluate agent conversations to find improvements for tooling, instructions, and documentation. The goal is NOT to judge answer correctness but to identify what system changes would help the agent take better trajectories. Use when asked to: review a chat, evaluate agent performance, find tooling improvements, analyze a conversation, inspect what happened in a chat, or when given a chat ID to review.
Analyse Trajectory
Generate trajectory files
uv run python -c "from varro.playground.trajectory import generate_chat_trajectory; print(generate_chat_trajectory(user_id=USER_ID, chat_id=CHAT_ID))"
Idempotent: turns regenerate only when turn.md is missing or .trajectory_version is outdated.
Trajectory file structure
Output at data/trajectory/{user_id}/{chat_id}/:
chat.md # one-line summary per turn: user input, tools, final excerpt
system_instructions.md # full system prompt given to the agent
tool_instructions.md # all tools with descriptions and parameter schemas
{turn_idx}/
turn.md # trajectory: User → Steps (Thinking/Actions/Observations) → Final response → Usage
tool_calls/ # extracted .sql, .py, large .txt results
images/ # extracted plots and images
Review process
- Read
chat.mdfor the overview - Read
system_instructions.mdandtool_instructions.mdonce to understand what the agent was given - For each turn, read
turn.mdand inspect extracted artifacts intool_calls/ - Evaluate each turn against the framework below
- Write findings to
data/trajectory/{user_id}/{chat_id}/findings.md
Evaluation framework
Focus on what system builders can change (instructions, tools, documentation), not on what the model should have known.
Instructions quality
Does the system prompt give the agent precise enough guidance?
- Agent guessing at workflow steps that instructions could have specified
- Agent ignoring instructions that exist (too buried or unclear)
- Missing guidance for a common question pattern
- Ambiguity that caused the agent to pick a suboptimal path
Tool adequacy
Do tools return clear, actionable output that makes the next decision obvious?
- Tool output missing information the agent needed next (row count, available levels, column names)
- Agent calling the same tool repeatedly to get information one call could have returned
- Agent working around a tool limitation using Bash/SQL when a dedicated tool or a small tool change would be cleaner
- Tool descriptions that are misleading or incomplete
- Fuzzy matching returning unhelpful results
Trajectory efficiency
Did the agent take unnecessary steps because of instruction or tool gaps?
- Steps that only exist because prior tool output was incomplete
- Exploratory steps that instructions could have eliminated
- Repeated queries that differ only in filter values the agent was searching for
- Trial-and-error discovery of something documentation could have stated
NameErroron a prior-turn dataframe may indicate shell state was lost (CLI restart, idle eviction) rather than agent misuse — don't count it as a tool error
Relevance
Is the user question within scope for the state statistician?
- Questions the agent shouldn't need to handle (general chat, non-data questions)
- Questions that are borderline — note whether the agent should redirect or attempt
Output format
Write findings to data/trajectory/{user_id}/{chat_id}/findings.md:
# Review: Chat {chat_id}
## Summary
{1-3 sentences: what the user asked, overall assessment of how the system supported the agent}
## Findings
### {short title}
**Dimension**: {Instructions | Tool | Trajectory | Documentation}
**Turn**: {turn_idx}, Step {step_idx}
**Observation**: {What happened — reference actual tool calls and results}
**Suggestion**: {Concrete change to instructions, tool output, or documentation}
**Impact**: {Steps saved, or what class of questions this helps}
...
## Verdict
{The single most impactful improvement from this review}
Guidelines:
- Be concrete. Reference actual step numbers, tool calls, and results.
- Suggest specific changes. "Add row count to Sql tool output" not "improve tool output."
- Estimate impact. "Would save 2-3 steps for geographical queries" is useful.
- One finding per root cause. Group repeated issues across turns.
- Skip clean turns — only note what can be improved.
Agent environment (reference)
The reviewed agent (Rigsstatistikeren) operates in a sandboxed filesystem:
/subjects/{root}/{mid}/{leaf}.md — subject overviews listing available tables
/fact/{root}/{mid}/{leaf}/{id}.md — per-table docs: columns, joins, value ranges
/dim/ — dimension table docs
/dashboard/ — saved dashboard definitions
/skills/ — guides for complex tasks (e.g., dashboard creation)
Tools: ColumnValues, Sql, Jupyter, Read, Write, Edit, Bash, UpdateUrl, Snapshot, WebSearch
Typical efficient trajectory for data analysis:
- Identify subject area →
Bash ls - Read subject overview →
Read - Read table docs →
Read - Check column values →
ColumnValues - Query data →
Sqlwithdf_name - Visualize →
Jupyterwithshow - Explain → final response
Expert Next.js App Router
Developpement
Un skill qui transforme Claude en expert Next.js App Router.
Générateur de README
Developpement
Crée des README.md professionnels et complets pour vos projets.
Rédacteur de Documentation API
Developpement
Génère de la documentation API complète au format OpenAPI/Swagger.