Notre avis

Surveille en continu les journaux d'entraînement RL, extrait les métriques clés, détecte les anomalies et analyse en profondeur les sorties de rollout et de juge pour les études LLM-as-a-Judge.

Points forts

Automatisation du suivi en temps réel
Détection précoce des anomalies comme l'explosion de gradient
Analyse orientée recherche des schémas de reward hacking et des biais
Génération de rapports structurés avec cas suspects et hypothèses

Limites

Nécessite un chemin de fichier de log valide
Dépend des commandes Unix de base (tail, grep)
Peut générer un volume important de sortie à analyser manuellement

Quand l'utiliser

Lors d'un entraînement RL prolongé nécessitant une surveillance automatisée et une analyse qualitative des comportements émergents.

Quand l'éviter

Pour une inspection ponctuelle de logs déjà archivés sans besoin de suivi continu.

Exemples

Basic RL log monitoring with periodic summaries

Monitor the RL training log at /path/to/log.txt, extract reward and loss every 100 steps, and provide a summary highlighting any anomalies or stagnation.

Scan rollout outputs for reward hacking

Scan the rollout output files in /path/to/rollout/ for suspicious patterns that might indicate reward hacking, such as unusually high scores with obvious flaws in the response. List potential hacking patterns.

Analyze judge outputs for bias

Analyze the judge output files in /path/to/judge/ for systematic bias in scoring. Look for score distributions skewed by rubric dimensions or entity mentions, and report any concerning trends.

name: rl-log-monitor description: 持续监控 RL 训练日志并总结关键指标、异常和趋势 allowed-tools: Read, Bash(tail:, grep:) context: fork agent: Explore

监控 RL 训练日志 $ARGUMENTS：

一、基础监控

使用 tail -f 持续读取日志文件
提取关键指标：reward、loss、episode length、success rate
识别异常模式：梯度爆炸、收敛停滞、性能下降
每 N 次迭代生成阶段性总结
标记需要人工干预的问题

二、Rollout & Judge 输出扫描（研究导向）

研究背景

本项目研究 LLM-as-a-Judge (LaaJ) 在 RL 中的应用，重点关注：

Rubrics 设计方式及其在 RL pipeline 中的使用
Reward hacking 现象的特征及其隐蔽性（是否能欺骗 in-domain test set）
隐蔽偏见对训练结果的影响

扫描任务

逐文件扫描 rollout 输出
- 检查生成的 response 是否出现可疑的 pattern
- 识别高分但与人类偏好 mismatch 的 case（如：偏好特定地区→response 中频繁出现相关实体）
逐文件扫描 judge 输出
- 分析 judge 评分分布及异常
- 识别 judge 的系统性偏见（天然 bias 或注入的 bias）
- 追踪 rubric 各维度的评分变化趋势
Reward Hacking 特征发掘
- 发现模型学到的"捷径"pattern
- 记录分数很高但明显不合理的 case（有传播价值/影响力）
- 对比有无 bias 注入时的训练差异

输出格式

每次扫描后生成报告，包含：

关键发现摘要
可疑 case 列表（prompt、response、score、分析）
Hacking pattern 假设
建议的人工复核点

Moniteur de Logs RL

Recommandé pour

Notre avis

Points forts

Limites

Analyse de sécurité

Exemples

name: rl-log-monitor description: 持续监控 RL 训练日志并总结关键指标、异常和趋势 allowed-tools: Read, Bash(tail:, grep:) context: fork agent: Explore

一、基础监控

二、Rollout & Judge 输出扫描（研究导向）

研究背景

扫描任务

输出格式

Ingénierie de Prompts

Visualisation de Données

Architecture RAG