Notre avis
Ce skill permet d'enregistrer, suivre et évaluer l'exactitude des prédictions sur l'IA dans le temps, en attribuant un statut (vérifié, falsifié, etc.) et un score de précision.
Points forts
- Structure claire avec champs obligatoires et optionnels pour chaque prédiction
- Catégories d'évaluation nuancées (vérifié, falsifié, partiellement vérifié, etc.)
- Processus d'évaluation en étapes avec scoring 0-1
- Calibration du prédicteur (sous-confiant, surconfiant, etc.)
Limites
- Nécessite une interprétation humaine pour les prédictions ambiguës
- Ne gère pas automatiquement la collecte de preuves
- La notation de précision reste subjective sans critères quantitatifs prédéfinis
Quand vous voulez analyser l'historique de prédictions d'un chercheur ou commentateur en IA pour juger de sa fiabilité.
Si vous cherchez un outil automatique de vérification des faits ou si les prédictions sont purement qualitatives sans échéance claire.
Analyse de sécurité
SûrThis skill is purely descriptive and provides a framework for tracking and evaluating AI predictions. It does not instruct any code execution, tool usage, or actions that could compromise security. It is designed for manual use in recording and analyzing prediction accuracy.
Aucun point d'attention détecté
Exemples
Evaluate the prediction by Sam Altman from March 2023 that GPT-5 would achieve AGI by 2027. Use the prediction tracking skill: restate the prediction, identify timeframe, gather evidence, assign status, score accuracy, and note lessons.Using the prediction tracking skill, compile a list of predictions made by Gary Marcus over the past 5 years, evaluate each with status and accuracy score, then compute overall accuracy and calibration (overconfident, underconfident, etc.) using the stats JSON output format.name: prediction-tracking description: Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.
Prediction Tracking Skill
Track predictions made by AI researchers and critics, evaluate their accuracy over time.
Prediction Recording
When recording a new prediction, capture:
Required Fields
- text: The prediction as stated
- author: Who made it
- madeAt: When it was made
- timeframe: When they expect it to happen
- topic: What area of AI
- confidence: How confident they seemed
Optional Fields
- sourceUrl: Where the prediction was made
- targetDate: Specific date if mentioned
- conditions: Any caveats or conditions
- metrics: How to measure success
Evaluation Status
When evaluating predictions, assign one of:
verified
Clearly came true as stated.
- The predicted capability/event occurred
- Within the stated timeframe
- Substantially as described
falsified
Clearly did not come true.
- Timeframe passed without occurrence
- Contradictory evidence emerged
- Author retracted or modified claim
partially-verified
Partially accurate.
- Some aspects came true, others didn't
- Capability exists but weaker than claimed
- Timeframe was off but direction correct
too-early
Not enough time has passed.
- Still within stated timeframe
- No definitive evidence either way
unfalsifiable
Cannot be objectively assessed.
- Too vague to measure
- No clear success criteria
- Moved goalposts
ambiguous
Prediction was too vague to evaluate.
- Multiple interpretations possible
- Success criteria unclear
Evaluation Process
For each prediction being evaluated:
1. Restate the prediction
What exactly was claimed?
2. Identify timeframe
Has enough time passed to evaluate?
3. Gather evidence
What has happened since?
- Relevant releases or announcements
- Benchmark results
- Real-world deployments
- Counter-evidence
4. Assess status
Which evaluation status applies?
5. Score accuracy
If verifiable, rate 0.0-1.0:
- 1.0: Exactly as predicted
- 0.7-0.9: Substantially correct
- 0.4-0.6: Partially correct
- 0.1-0.3: Mostly wrong
- 0.0: Completely wrong
6. Note lessons
What does this tell us about:
- The author's forecasting ability
- The topic's predictability
- Common prediction pitfalls
Output Format
For evaluation:
{
"evaluations": [
{
"predictionId": "id",
"status": "verified",
"accuracyScore": 0.85,
"evidence": "Description of evidence",
"notes": "Additional context",
"evaluatedAt": "timestamp"
}
]
}
For accuracy statistics:
{
"author": "Author name",
"totalPredictions": 15,
"verified": 5,
"falsified": 3,
"partiallyVerified": 2,
"pending": 4,
"unfalsifiable": 1,
"averageAccuracy": 0.62,
"topicBreakdown": {
"reasoning": { "predictions": 5, "accuracy": 0.7 },
"agents": { "predictions": 3, "accuracy": 0.4 }
},
"calibration": "Assessment of how well-calibrated they are"
}
Calibration Assessment
Evaluate whether predictors are well-calibrated:
Well-Calibrated
- High-confidence predictions usually come true
- Low-confidence predictions have mixed results
- Acknowledges uncertainty appropriately
Overconfident
- High-confidence predictions often fail
- Rarely expresses uncertainty
- Doesn't update on evidence
Underconfident
- Low-confidence predictions often come true
- Hedges even on likely outcomes
- Too conservative
Inconsistent
- Confidence doesn't correlate with accuracy
- Random relationship between stated and actual accuracy
Tracking Notable Predictors
Keep running assessments of key voices:
| Predictor | Total | Accuracy | Calibration | Notes | |-----------|-------|----------|-------------|-------| | Sam Altman | 20 | 55% | Overconfident | Timeline optimism | | Gary Marcus | 15 | 70% | Well-calibrated | Conservative | | Dario Amodei | 12 | 65% | Slightly over | Safety-focused |
Red Flags
Watch for prediction patterns that suggest bias:
- Always bullish regardless of topic
- Never acknowledges failed predictions
- Moves goalposts when wrong
- Predictions align suspiciously with financial interests
- Vague enough to claim credit for anything
Ingénierie de Prompts
Data & IA
Bonnes pratiques et templates de prompt engineering pour maximiser les résultats IA.
Visualisation de Données
Data & IA
Génère des visualisations de données et graphiques adaptés à vos données.
Architecture RAG
Data & IA
Guide de configuration d'architectures RAG (Retrieval-Augmented Generation).