RL Training Log Monitor

VerifiedSafe

Continuously monitors RL training logs via tail -f, extracting key metrics (reward, loss, episode length, success rate) and flagging anomalies like gradient explosion or convergence stalls. A research focus is scanning rollout and judge outputs to detect reward hacking patterns, systematic biases, and high-scoring cases mismatched with human preferences. Helps during exploratory RL experiments that require close inspection of undesirable model behaviors.

Sby Skills Guide Bot
Data & AIAdvanced
706/2/2026
Claude Code
#rl-training#log-monitoring#reward-hacking#llm-as-a-judge

Recommended for

Our review

Continuously monitors RL training logs, extracts key metrics, detects anomalies, and performs deep scans of rollout and judge outputs for LLM-as-a-Judge research.

Strengths

  • Real-time automated monitoring
  • Early detection of anomalies like gradient explosion or convergence stalls
  • Research-oriented analysis of reward hacking patterns and biases
  • Structured reports with suspicious cases and actionable hypotheses

Limitations

  • Requires a valid log file path
  • Relies on basic Unix tools (tail, grep)
  • Output may be large and require manual review for complex cases
When to use it

During extended RL training runs where you need both automated performance tracking and in-depth qualitative analysis of model behavior.

When not to use it

For a quick one-time log review without the need for continuous monitoring or research-specific scanning.

Security analysis

Safe
Quality score85/100

Uses only read-only bash commands (tail, grep) to monitor logs; no destructive or exfiltrating actions.

No concerns found

Examples

Basic RL log monitoring with periodic summaries
Monitor the RL training log at /path/to/log.txt, extract reward and loss every 100 steps, and provide a summary highlighting any anomalies or stagnation.
Scan rollout outputs for reward hacking
Scan the rollout output files in /path/to/rollout/ for suspicious patterns that might indicate reward hacking, such as unusually high scores with obvious flaws in the response. List potential hacking patterns.
Analyze judge outputs for bias
Analyze the judge output files in /path/to/judge/ for systematic bias in scoring. Look for score distributions skewed by rubric dimensions or entity mentions, and report any concerning trends.

name: rl-log-monitor description: 持续监控 RL 训练日志并总结关键指标、异常和趋势 allowed-tools: Read, Bash(tail:, grep:) context: fork agent: Explore

监控 RL 训练日志 $ARGUMENTS:

一、基础监控

  1. 使用 tail -f 持续读取日志文件
  2. 提取关键指标:reward、loss、episode length、success rate
  3. 识别异常模式:梯度爆炸、收敛停滞、性能下降
  4. 每 N 次迭代生成阶段性总结
  5. 标记需要人工干预的问题

二、Rollout & Judge 输出扫描(研究导向)

研究背景

本项目研究 LLM-as-a-Judge (LaaJ) 在 RL 中的应用,重点关注:

  • Rubrics 设计方式及其在 RL pipeline 中的使用
  • Reward hacking 现象的特征及其隐蔽性(是否能欺骗 in-domain test set)
  • 隐蔽偏见对训练结果的影响

扫描任务

  1. 逐文件扫描 rollout 输出

    • 检查生成的 response 是否出现可疑的 pattern
    • 识别高分但与人类偏好 mismatch 的 case(如:偏好特定地区→response 中频繁出现相关实体)
  2. 逐文件扫描 judge 输出

    • 分析 judge 评分分布及异常
    • 识别 judge 的系统性偏见(天然 bias 或注入的 bias)
    • 追踪 rubric 各维度的评分变化趋势
  3. Reward Hacking 特征发掘

    • 发现模型学到的"捷径"pattern
    • 记录分数很高但明显不合理的 case(有传播价值/影响力)
    • 对比有无 bias 注入时的训练差异

输出格式

每次扫描后生成报告,包含:

  • 关键发现摘要
  • 可疑 case 列表(prompt、response、score、分析)
  • Hacking pattern 假设
  • 建议的人工复核点
Related skills