Bayesian Monte Carlo Re-Analysis

VerifiedCaution

Performs Bayesian Monte Carlo re-analysis of clinical trials by combining prior evidence with observed data. Used for underpowered trials, synthesis of multiple studies, or drug repurposing to estimate the probability of a clinically meaningful treatment effect.

Sby Skills Guide Bot
Data & AIAdvanced
706/2/2026
Claude Code
#bayesian#reanalysis#monte-carlo#literature-prior#treatment-effect

Recommended for

Our review

Performs a Bayesian Monte Carlo re-analysis combining literature-derived priors with trial data to estimate the probability of a clinically meaningful treatment effect.

Strengths

  • Incorporates prior evidence to boost statistical power in underpowered trials
  • Provides an interpretable probability of meaningful effect (P(effect))
  • Includes sensitivity analysis with skeptical and enthusiastic priors
  • Generates clear diagnostic visualizations (HDI, ROPE)

Limitations

  • Requires extracted effect sizes from moderate- or high-quality literature
  • Relies on the validity of the normal-normal conjugate model
  • Does not replace a well-designed randomized trial
When to use it

Use this skill when a trial is underpowered or when multiple prior studies exist and you need to quantify the probability of a real effect.

When not to use it

Do not use it if no reliable prior evidence exists, if populations are fundamentally different, or if the trial is already clearly positive or negative with adequate power.

Security analysis

Caution
Quality score90/100

The skill instructs the AI agent to copy and run a Python script that processes local JSON data. While the purpose is legitimate Bayesian analysis, execution of code from a template carries inherent risk if the template were malicious. The skill does not itself contain harmful commands, but it facilitates file operations and script execution, which warrants caution.

Findings
  • Executes a Python script that reads/writes local files; potential risk if template is compromised, but no destructive/exfiltrating commands evident.

Examples

Bayesian re-analysis of underpowered trial with prior evidence
I have an underpowered trial with an HR of 0.82 (95% CI 0.63–1.07) for survival. Prior evidence includes two moderate-quality studies with effect sizes of 0.75 and 0.85. The MCID is 0.80. Run a Bayesian Monte Carlo re-analysis to compute the posterior probability of a meaningful effect.
Sensitivity analysis with skeptical and enthusiastic priors
Perform a Bayesian re-analysis on this binary endpoint (RR 0.70, 95% CI 0.50–0.98) with MCID 0.80. Use inverse-variance weighted pooling from the prior evidence report and generate sensitivity analyses with skeptical and enthusiastic priors. Show the 4-panel diagnostic chart.

name: bayesian-reanalysis description: Bayesian Monte Carlo re-analysis using literature-derived priors, posterior sampling, HDI, and ROPE analysis to assess probability of meaningful treatment effect.

Bayesian Monte Carlo Re-Analysis

When to Use

When prior evidence exists and you want to answer: "Given what we already know, what's the probability this treatment has a meaningful effect?"

This is fundamentally different from frequentist analysis. Frequentist methods ask "how surprising is this data if the null is true?" Bayesian analysis asks "given the data AND prior evidence, what do we believe about the treatment effect?"

Use this module when:

  • U1 (Underpowered Trial): A "negative" trial exists but prior evidence suggests the effect might be real. The Bayesian posterior can shift the balance when the frequentist CI is wide.
  • T2 (Interventional Efficacy): Multiple prior studies exist with varying results. Bayesian synthesis provides a coherent probability statement.
  • T4 (Repurposing): Evidence from the original indication provides an informative prior for the new indication.

Do NOT use when:

  • No prior evidence exists (uninformative priors add nothing over frequentist analysis)
  • The prior evidence is from fundamentally different populations/mechanisms
  • The trial is well-powered and clearly positive or negative (Bayesian analysis won't change the conclusion)

Prerequisites

Before running this module, these sandbox files should exist:

  • ./parsed_hypothesis.json — PICO and trial data
  • ./prior_evidence_report.jsonmust contain extracted_effect_sizes with at least one entry of quality "moderate" or "high"
  • ./power_analysis_results.json — frequentist analysis results (for MCID and CI)

If prior_evidence_report.json has no extracted_effect_sizes or all are quality "low", skip this module and note in output that Bayesian analysis was not performed due to insufficient prior evidence.

Method

This module uses Monte Carlo sampling from the exact analytical posterior (Normal-Normal conjugate). The template:

  1. Reads prior_evidence_report.json and constructs an informative prior via inverse-variance weighted pooling
  2. Computes the exact analytical posterior
  3. Draws 10,000 posterior samples for HDI and ROPE analysis
  4. Runs sensitivity analysis across 3 prior specifications
  5. Generates a 4-panel diagnostic chart

Instructions

Step 1: Extract Trial Data

From ./parsed_hypothesis.json, extract:

  • Endpoint type (binary, continuous, survival)
  • Observed effect (RR, HR, mean difference)
  • 95% CI bounds (to derive SE)

From ./power_analysis_results.json, extract:

  • MCID (minimum clinically important difference)

Step 2: Customize and Run the Template

Copy templates/bayesian_mc_reanalysis.py to ./bayesian_mc_reanalysis.py.

Edit the CONFIG section only:

ENDPOINT_TYPE = "survival"  # "binary" | "continuous" | "survival"
OBSERVED_EFFECT = 0.82      # HR, RR, or mean difference
CI_LOWER = 0.63
CI_UPPER = 1.07
MCID = 0.80                 # Minimum clinically important difference
ROPE_BOUNDS = None           # Auto-compute, or set [lower, upper] on log scale
N_SAMPLES = 10_000

Then run:

python3 ./bayesian_mc_reanalysis.py

The template automatically:

  • Reads prior_evidence_report.json and constructs the prior
  • Pools effect sizes using inverse-variance weighting
  • Maps study quality/type to prior SD
  • Generates skeptical and enthusiastic priors for sensitivity
  • Computes HDI (Highest Density Interval)
  • Runs ROPE (Region of Practical Equivalence) analysis
  • Produces 4-panel chart and JSON output

Step 3: Interpret Results

The template writes ./bayesian_reanalysis.json and ./bayesian_reanalysis.png.

Key outputs to report:

  1. P(meaningful effect): Core probability statement

    • > 0.80: Strong posterior support
    • 0.50–0.80: Moderate support
    • 0.20–0.50: Weak support
    • < 0.20: Very weak
  2. HDI (Highest Density Interval): Narrowest 95% credible interval. More informative than equal-tailed CI for skewed posteriors. Report on natural scale (HR, RR, etc.).

  3. ROPE analysis: Does the posterior overlap with "practically null" effects?

    • P(ROPE) > 0.95: Accept practical equivalence — no meaningful effect
    • P(ROPE) < 0.05: Reject equivalence — real effect exists
    • Otherwise: Undecided — data insufficient
  4. Sensitivity: Is the verdict robust across skeptical / evidence-based / enthusiastic priors?

    • Robust = same category under all 3 → strong conclusion
    • Sensitive = flips between priors → data insufficient to overcome prior uncertainty

Step 4: Write the Verdict

Synthesize into a clear statement:

  • BAYESIAN SUPPORT: P(meaningful) > 0.80 under evidence-based prior, > 0.50 under skeptical prior. Prior and data agree. ROPE rejected.
  • BAYESIAN LEAN: P(meaningful) 0.50–0.80 under evidence-based prior. Data shift the posterior but don't overcome skepticism. Consider adaptive design.
  • BAYESIAN NEUTRAL: P(meaningful) 0.20–0.50. Prior and data are in tension or both weak. ROPE undecided.
  • BAYESIAN AGAINST: P(meaningful) < 0.20. Even with favorable prior, the data do not support a meaningful effect.

Limitations and Caveats

You MUST mention these in any report that includes Bayesian results:

  1. LLM-extracted priors: Effect sizes used to construct the prior were extracted by an LLM from PubMed abstracts, not by a trained systematic reviewer. Prior SDs are inflated ~50% vs. textbook values to partially compensate, but misextraction remains possible.

  2. No heterogeneity modeling: The prior SD is based on study count and concordance, not on formal between-study heterogeneity (I²/tau²). A meta-analysis with high heterogeneity deserves a wider prior than reflected here.

  3. Double-counting risk: If the trial being analyzed was included in a meta-analysis used as prior, the posterior is overconfident. Check that the prior studies are independent of the current trial.

  4. ROPE bounds are heuristic: Default ROPE (± half of |log(MCID)|) is a computational convenience, not a clinically validated equivalence margin.

  5. Conjugate model only: The Normal-Normal model assumes symmetric uncertainty on the log scale. For very rare events or extreme effects, this approximation breaks down.

Bottom line: This Bayesian analysis provides a structured framework for prior-data synthesis, not a definitive probability. The sensitivity analysis is the most important output — if the verdict flips between priors, the data cannot adjudicate.

Prior Construction Reference

The template constructs priors automatically, but understanding the rules is important for interpretation:

| Evidence Source | Prior SD (log scale) | Rationale | |----------------|---------------------|-----------| | Meta-analysis, concordant | 0.15 | Strong prior — inflated from 0.10 for LLM extraction | | 3+ studies, concordant | 0.22 | Strong prior — inflated from 0.15 | | 2 studies, concordant | 0.28 | Moderate prior — inflated from 0.20 | | 2 studies, conflicting | 0.35 | Wider — conflicting evidence | | Single RCT | 0.30 | Moderate — inflated from 0.25 | | Single observational | 0.40 | Weak — confounding + extraction uncertainty | | No usable evidence | 0.50 | Uninformative — skip module |

Note: All SDs are intentionally wider than textbook recommendations (~50% inflation) because effect sizes are extracted by an LLM, not a trained reviewer. If the spread of extracted values exceeds the base SD, it is inflated further.

Prior mean: Inverse-variance weighted pooled estimate from moderate/high quality studies, on the analysis scale (log-HR, log-RR, or raw difference).

ROPE Bounds Reference

ROPE (Region of Practical Equivalence) defines effect sizes too small to matter:

| Measure | Default ROPE | Rationale | |---------|-------------|-----------| | HR/RR (log scale) | ± half of |log(MCID)| | Effects within this range are clinically negligible | | Mean difference | ± half of |MCID| | Same principle for continuous outcomes |

Set ROPE_BOUNDS explicitly in CONFIG if the default is inappropriate for the clinical context.

Output Schema

The template writes ./bayesian_reanalysis.json:

{
  "module": "bayesian_reanalysis",
  "method": "monte_carlo",
  "n_samples": 10000,
  "endpoint_type": "binary|continuous|survival",
  "prior": {
    "source": "description of evidence used",
    "citations": ["PMID:12345678"],
    "mean": 0.0,
    "sd": 0.0,
    "scale": "log_hr|log_rr|raw_difference",
    "quality": "high|moderate|low|none",
    "n_studies": 0,
    "rationale": "How the prior was constructed"
  },
  "likelihood": {
    "observed": 0.0,
    "se": 0.0,
    "scale": "log(HR)"
  },
  "posterior": {
    "mean": 0.0,
    "sd": 0.0,
    "ci_95": [0.0, 0.0],
    "hdi_95": [0.0, 0.0],
    "natural_scale": {
      "point_estimate": 0.0,
      "hdi_95": [0.0, 0.0]
    }
  },
  "prob_meaningful_effect": 0.0,
  "mcid": 0.0,
  "rope": {
    "bounds": [0.0, 0.0],
    "bounds_natural_scale": [0.0, 0.0],
    "prob_in_rope": 0.0,
    "decision": "reject_equivalence|accept_equivalence|undecided"
  },
  "sensitivity": {
    "skeptical": {
      "prior_mean": 0.0,
      "prior_sd": 0.0,
      "post_mean": 0.0,
      "post_sd": 0.0,
      "hdi_95": [0.0, 0.0],
      "prob_meaningful": 0.0
    },
    "evidence_based": { "..." : "same structure" },
    "enthusiastic": { "..." : "same structure" },
    "robust": true
  },
  "verdict": "bayesian_support|bayesian_lean|bayesian_neutral|bayesian_against",
  "severity": "low|medium|high|critical",
  "title": "One-line summary",
  "analysis": "Detailed paragraph explaining the Bayesian reasoning",
  "charts": ["./bayesian_reanalysis.png"]
}
Related skills