Our review
Evaluates an AI agent's adaptive behavior during a session using a 6-check rubric covering mismatch detection, plan revision, tool switching, memory update, proof generation, and stop condition.
Strengths
- Encourages honest self-assessment and reflection on session performance.
- Covers key dimensions of adaptive behavior with clear scoring criteria.
- Provides actionable improvement suggestions based on weakest check.
- Integrates with active_context.yaml for persistent records.
Limitations
- Relies on user honesty; results are subjective and not externally validated.
- Requires prior setup of active_context.yaml and familiarity with the rubric.
- May not capture all aspects of agent performance or nuanced behavior.
After completing significant work or encountering challenges to reflect on and improve adaptive behavior.
For trivial sessions or when immediate progress is more important than reflection.
Security analysis
SafeThis skill only performs self-assessment by reading a local YAML file and generating a score. There are no destructive, exfiltrating, or obfuscated actions. No external command execution or network access.
No concerns found
Examples
Run the edge-score assessment on my recent session to evaluate my adaptive behavior against the 6-check rubric.Perform a 6-check self-assessment after completing the refactoring task. Score each check and update active_context.yaml with the results.Evaluate my adaptive behavior during the debugging session using the 6-check rubric, focusing on plan revision and tool switching.name: edge-score description: Self-assessment against the 6-check adaptation rubric. Use after completing significant work to evaluate adaptive behavior.
Self-Score: 6-Check Assessment
Evaluate your adaptive behavior during this session against the 6-check rubric.
Read active_context.yaml to understand what was accomplished.
The 6 Checks
Score yourself honestly on each:
1. Mismatch Detection
Question: Did I spot divergences between expectations and reality quickly?
| Score | Meaning | |-------|---------| | Met | Caught mismatches immediately, logged them with deltas | | Missed | Plowed forward despite signals, retried without noticing |
2. Plan Revision
Question: When things went wrong, did I change my approach (not just retry)?
| Score | Meaning | |-------|---------| | Met | Wrote new strategies, reduced step size, added guards | | Missed | Repeated the same step 3+ times |
3. Tool Switching
Question: Did I abandon tools that weren't working and try alternatives?
| Score | Meaning | |-------|---------| | Met | Switched methods when one failed, preferred simpler approaches | | N/A | No tool failures occurred | | Missed | Kept hammering same tool despite failures |
4. Memory Update
Question: Did I capture reusable lessons from what I learned?
| Score | Meaning | |-------|---------| | Met | Added trigger-linked lessons to memory | | Missed | Solved problems but didn't record patterns |
5. Proof Generation
Question: Did I attach evidence, not just claims?
| Score | Meaning | |-------|---------| | Met | Every major step has proof (logs, diffs, test results) | | Missed | "Trust me" summaries without evidence |
6. Stop Condition
Question: Did I escalate appropriately when blocked or uncertain?
| Score | Meaning | |-------|---------| | Met | Asked crisp questions, presented bounded options | | N/A | Never hit uncertainty requiring escalation | | Missed | Guessed when should have asked, or asked trivial questions |
Instructions
-
Review the session
- What mismatches occurred?
- How did you respond?
- What lessons were captured?
-
Score each check Be honest. Mark:
met: trueif you satisfied the checkmet: falseif you missed it- Include a brief note explaining why
-
Update active_context.yaml
self_score: timestamp: "<current_iso_timestamp>" checks: mismatch_detection: met: true note: "Caught API 403 immediately, logged delta" plan_revision: met: true note: "Added token refresh step instead of retrying" tool_switching: met: false note: "N/A - no tool failures" memory_update: met: true note: "Added lesson about token expiry" proof_generation: met: true note: "Attached error log and fix diff" stop_condition: met: true note: "Asked about auth approach before proceeding" total: 5 level: "real_agent" -
Determine level
| Score | Level | Meaning | |-------|-------|---------| | 0-2 |
demo_automation| Just following scripts | | 3-4 |promising_fragile| Some adaptation, gaps remain | | 5-6 |real_agent| True adaptive behavior |
Output
After scoring, provide:
- Score: N/6
- Level: demo_automation | promising_fragile | real_agent
- Strongest check: which one you did best
- Weakest check: which one to improve
- Carry-forward: what to do better next session
Improvement Suggestions
Based on your weakest check, here are concrete improvements:
If mismatch_detection is weak:
- Add explicit "Expected vs Actual" statements before major operations
- Use diff/delta logging more aggressively
If plan_revision is weak:
- After any failure, write a NEW step before retrying
- Break large steps into smaller ones when stuck
If tool_switching is weak:
- If a tool fails twice, switch tools immediately
- Prefer simpler tools when complex ones struggle
If memory_update is weak:
- After each step, ask "what did I learn?"
- Add lessons with specific triggers, not vague wisdom
If proof_generation is weak:
- Attach evidence inline, not after the fact
- For code changes: describe the diff
- For tests: show the output
If stop_condition is weak:
- When uncertain, frame as bounded options (not open questions)
- Escalate BEFORE guessing, not after failing
Next.js App Router Expert
Development
A skill that turns Claude into a Next.js App Router expert.
README Generator
Development
Creates professional and comprehensive README.md files for your projects.
API Documentation Writer
Development
Generates comprehensive API documentation in OpenAPI/Swagger format.