Notre avis
Évalue le comportement adaptatif d'un agent IA au cours d'une session à l'aide d'une rubrique de 6 critères : détection des inadéquations, révision du plan, changement d'outil, mise à jour de la mémoire, génération de preuves et condition d'arrêt.
Points forts
- Encourage une auto-évaluation honnête et une réflexion sur la performance de la session.
- Couvre les dimensions clés du comportement adaptatif avec des critères de notation clairs.
- Fournit des suggestions d'amélioration exploitables en fonction du critère le plus faible.
- S'intègre à active_context.yaml pour un suivi persistant.
Limites
- Repose sur l'honnêteté de l'utilisateur ; les résultats sont subjectifs et non validés extérieurement.
- Nécessite la configuration préalable de active_context.yaml et une familiarité avec la rubrique.
- Peut ne pas capturer tous les aspects de la performance de l'agent ou les comportements nuancés.
Après un travail significatif ou des difficultés pour réfléchir et améliorer le comportement adaptatif.
Pour des sessions triviales ou lorsque le progrès immédiat est plus important que la réflexion.
Analyse de sécurité
SûrThis skill only performs self-assessment by reading a local YAML file and generating a score. There are no destructive, exfiltrating, or obfuscated actions. No external command execution or network access.
Aucun point d'attention détecté
Exemples
Run the edge-score assessment on my recent session to evaluate my adaptive behavior against the 6-check rubric.Perform a 6-check self-assessment after completing the refactoring task. Score each check and update active_context.yaml with the results.Evaluate my adaptive behavior during the debugging session using the 6-check rubric, focusing on plan revision and tool switching.name: edge-score description: Self-assessment against the 6-check adaptation rubric. Use after completing significant work to evaluate adaptive behavior.
Self-Score: 6-Check Assessment
Evaluate your adaptive behavior during this session against the 6-check rubric.
Read active_context.yaml to understand what was accomplished.
The 6 Checks
Score yourself honestly on each:
1. Mismatch Detection
Question: Did I spot divergences between expectations and reality quickly?
| Score | Meaning | |-------|---------| | Met | Caught mismatches immediately, logged them with deltas | | Missed | Plowed forward despite signals, retried without noticing |
2. Plan Revision
Question: When things went wrong, did I change my approach (not just retry)?
| Score | Meaning | |-------|---------| | Met | Wrote new strategies, reduced step size, added guards | | Missed | Repeated the same step 3+ times |
3. Tool Switching
Question: Did I abandon tools that weren't working and try alternatives?
| Score | Meaning | |-------|---------| | Met | Switched methods when one failed, preferred simpler approaches | | N/A | No tool failures occurred | | Missed | Kept hammering same tool despite failures |
4. Memory Update
Question: Did I capture reusable lessons from what I learned?
| Score | Meaning | |-------|---------| | Met | Added trigger-linked lessons to memory | | Missed | Solved problems but didn't record patterns |
5. Proof Generation
Question: Did I attach evidence, not just claims?
| Score | Meaning | |-------|---------| | Met | Every major step has proof (logs, diffs, test results) | | Missed | "Trust me" summaries without evidence |
6. Stop Condition
Question: Did I escalate appropriately when blocked or uncertain?
| Score | Meaning | |-------|---------| | Met | Asked crisp questions, presented bounded options | | N/A | Never hit uncertainty requiring escalation | | Missed | Guessed when should have asked, or asked trivial questions |
Instructions
-
Review the session
- What mismatches occurred?
- How did you respond?
- What lessons were captured?
-
Score each check Be honest. Mark:
met: trueif you satisfied the checkmet: falseif you missed it- Include a brief note explaining why
-
Update active_context.yaml
self_score: timestamp: "<current_iso_timestamp>" checks: mismatch_detection: met: true note: "Caught API 403 immediately, logged delta" plan_revision: met: true note: "Added token refresh step instead of retrying" tool_switching: met: false note: "N/A - no tool failures" memory_update: met: true note: "Added lesson about token expiry" proof_generation: met: true note: "Attached error log and fix diff" stop_condition: met: true note: "Asked about auth approach before proceeding" total: 5 level: "real_agent" -
Determine level
| Score | Level | Meaning | |-------|-------|---------| | 0-2 |
demo_automation| Just following scripts | | 3-4 |promising_fragile| Some adaptation, gaps remain | | 5-6 |real_agent| True adaptive behavior |
Output
After scoring, provide:
- Score: N/6
- Level: demo_automation | promising_fragile | real_agent
- Strongest check: which one you did best
- Weakest check: which one to improve
- Carry-forward: what to do better next session
Improvement Suggestions
Based on your weakest check, here are concrete improvements:
If mismatch_detection is weak:
- Add explicit "Expected vs Actual" statements before major operations
- Use diff/delta logging more aggressively
If plan_revision is weak:
- After any failure, write a NEW step before retrying
- Break large steps into smaller ones when stuck
If tool_switching is weak:
- If a tool fails twice, switch tools immediately
- Prefer simpler tools when complex ones struggle
If memory_update is weak:
- After each step, ask "what did I learn?"
- Add lessons with specific triggers, not vague wisdom
If proof_generation is weak:
- Attach evidence inline, not after the fact
- For code changes: describe the diff
- For tests: show the output
If stop_condition is weak:
- When uncertain, frame as bounded options (not open questions)
- Escalate BEFORE guessing, not after failing
Expert Next.js App Router
Developpement
Un skill qui transforme Claude en expert Next.js App Router.
Générateur de README
Developpement
Crée des README.md professionnels et complets pour vos projets.
Rédacteur de Documentation API
Developpement
Génère de la documentation API complète au format OpenAPI/Swagger.