Auto-évaluation : Rubrique 6 Contrôles

VérifiéSûr

Outil d'auto-évaluation basé sur six critères pour analyser le comportement adaptatif après un travail significatif. Permet d'identifier les points forts et faibles dans la détection d'écarts, la révision de plan, le changement d'outils, la mise à jour de mémoire, la production de preuves et les conditions d'arrêt. À utiliser après des sessions importantes pour suivre les progrès.

Spar Skills Guide Bot
DeveloppementIntermédiaire
8002/06/2026
Claude Code
#self-assessment#adaptive-behavior#agent-improvement#session-review#retrospective

Recommandé pour

Notre avis

Évalue le comportement adaptatif d'un agent IA au cours d'une session à l'aide d'une rubrique de 6 critères : détection des inadéquations, révision du plan, changement d'outil, mise à jour de la mémoire, génération de preuves et condition d'arrêt.

Points forts

  • Encourage une auto-évaluation honnête et une réflexion sur la performance de la session.
  • Couvre les dimensions clés du comportement adaptatif avec des critères de notation clairs.
  • Fournit des suggestions d'amélioration exploitables en fonction du critère le plus faible.
  • S'intègre à active_context.yaml pour un suivi persistant.

Limites

  • Repose sur l'honnêteté de l'utilisateur ; les résultats sont subjectifs et non validés extérieurement.
  • Nécessite la configuration préalable de active_context.yaml et une familiarité avec la rubrique.
  • Peut ne pas capturer tous les aspects de la performance de l'agent ou les comportements nuancés.
Quand l'utiliser

Après un travail significatif ou des difficultés pour réfléchir et améliorer le comportement adaptatif.

Quand l'éviter

Pour des sessions triviales ou lorsque le progrès immédiat est plus important que la réflexion.

Analyse de sécurité

Sûr
Score qualité90/100

This skill only performs self-assessment by reading a local YAML file and generating a score. There are no destructive, exfiltrating, or obfuscated actions. No external command execution or network access.

Aucun point d'attention détecté

Exemples

Post-session self-assessment
Run the edge-score assessment on my recent session to evaluate my adaptive behavior against the 6-check rubric.
After refactoring review
Perform a 6-check self-assessment after completing the refactoring task. Score each check and update active_context.yaml with the results.
Recovery from failure
Evaluate my adaptive behavior during the debugging session using the 6-check rubric, focusing on plan revision and tool switching.

name: edge-score description: Self-assessment against the 6-check adaptation rubric. Use after completing significant work to evaluate adaptive behavior.

Self-Score: 6-Check Assessment

Evaluate your adaptive behavior during this session against the 6-check rubric.

Read active_context.yaml to understand what was accomplished.

The 6 Checks

Score yourself honestly on each:

1. Mismatch Detection

Question: Did I spot divergences between expectations and reality quickly?

| Score | Meaning | |-------|---------| | Met | Caught mismatches immediately, logged them with deltas | | Missed | Plowed forward despite signals, retried without noticing |

2. Plan Revision

Question: When things went wrong, did I change my approach (not just retry)?

| Score | Meaning | |-------|---------| | Met | Wrote new strategies, reduced step size, added guards | | Missed | Repeated the same step 3+ times |

3. Tool Switching

Question: Did I abandon tools that weren't working and try alternatives?

| Score | Meaning | |-------|---------| | Met | Switched methods when one failed, preferred simpler approaches | | N/A | No tool failures occurred | | Missed | Kept hammering same tool despite failures |

4. Memory Update

Question: Did I capture reusable lessons from what I learned?

| Score | Meaning | |-------|---------| | Met | Added trigger-linked lessons to memory | | Missed | Solved problems but didn't record patterns |

5. Proof Generation

Question: Did I attach evidence, not just claims?

| Score | Meaning | |-------|---------| | Met | Every major step has proof (logs, diffs, test results) | | Missed | "Trust me" summaries without evidence |

6. Stop Condition

Question: Did I escalate appropriately when blocked or uncertain?

| Score | Meaning | |-------|---------| | Met | Asked crisp questions, presented bounded options | | N/A | Never hit uncertainty requiring escalation | | Missed | Guessed when should have asked, or asked trivial questions |

Instructions

  1. Review the session

    • What mismatches occurred?
    • How did you respond?
    • What lessons were captured?
  2. Score each check Be honest. Mark:

    • met: true if you satisfied the check
    • met: false if you missed it
    • Include a brief note explaining why
  3. Update active_context.yaml

    self_score:
      timestamp: "<current_iso_timestamp>"
      checks:
        mismatch_detection:
          met: true
          note: "Caught API 403 immediately, logged delta"
        plan_revision:
          met: true
          note: "Added token refresh step instead of retrying"
        tool_switching:
          met: false
          note: "N/A - no tool failures"
        memory_update:
          met: true
          note: "Added lesson about token expiry"
        proof_generation:
          met: true
          note: "Attached error log and fix diff"
        stop_condition:
          met: true
          note: "Asked about auth approach before proceeding"
      total: 5
      level: "real_agent"
    
  4. Determine level

    | Score | Level | Meaning | |-------|-------|---------| | 0-2 | demo_automation | Just following scripts | | 3-4 | promising_fragile | Some adaptation, gaps remain | | 5-6 | real_agent | True adaptive behavior |

Output

After scoring, provide:

  • Score: N/6
  • Level: demo_automation | promising_fragile | real_agent
  • Strongest check: which one you did best
  • Weakest check: which one to improve
  • Carry-forward: what to do better next session

Improvement Suggestions

Based on your weakest check, here are concrete improvements:

If mismatch_detection is weak:

  • Add explicit "Expected vs Actual" statements before major operations
  • Use diff/delta logging more aggressively

If plan_revision is weak:

  • After any failure, write a NEW step before retrying
  • Break large steps into smaller ones when stuck

If tool_switching is weak:

  • If a tool fails twice, switch tools immediately
  • Prefer simpler tools when complex ones struggle

If memory_update is weak:

  • After each step, ask "what did I learn?"
  • Add lessons with specific triggers, not vague wisdom

If proof_generation is weak:

  • Attach evidence inline, not after the fact
  • For code changes: describe the diff
  • For tests: show the output

If stop_condition is weak:

  • When uncertain, frame as bounded options (not open questions)
  • Escalate BEFORE guessing, not after failing
Skills similaires