CLIP-seq Peak Calling

VerifiedSafe

This skill calls protein-RNA binding site peaks from aligned CLIP-seq data using tools such as CLIPper, PureCLIP, or Piranha. It is used after read alignment to identify RBP binding regions before downstream analysis.

Sby Skills Guide Bot
Data & AIIntermediate
706/2/2026
Claude Code
#bioinformatics#clip-seq#peak-calling#rna-binding#genomics

Recommended for

Our review

This skill calls protein-RNA binding site peaks from CLIP-seq data using tools such as CLIPper, PureCLIP, or Piranha.

Strengths

  • Supports multiple leading peak callers (CLIPper, PureCLIP, Piranha).
  • Provides detailed commands and explanations of key options.
  • Offers adaptations for different CLIP protocols (eCLIP, iCLIP, PAR-CLIP).
  • Includes quality filters like FDR or confidence scores.

Limitations

  • Requires the tools (CLIPper, PureCLIP, etc.) to be pre-installed in the environment.
  • Does not cover downstream analysis (motifs, functional annotations).
  • Default parameters may not suit all datasets.
When to use it

Use this skill when you have aligned CLIP-seq data (BAM files) and want to identify binding sites of an RNA-binding protein.

When not to use it

Avoid this skill if you are working with standard RNA-seq data (not RNA-protein interactions) or if you need a full analysis pipeline including functional annotation of peaks.

Security analysis

Safe
Quality score90/100

The skill provides commands for standard bioinformatics tools (CLIPper, PureCLIP, etc.) for peak calling on CLIP-seq data. There are no destructive, exfiltration, or obfuscated commands, and no use of curl, sh, or similar risky patterns.

No concerns found

Examples

Basic CLIPper peak calling
Run CLIPper on deduped.bam for human genome hg38, output peaks.bed with FDR 0.05 and superlocal background.
PureCLIP with input control
Call crosslink sites using PureCLIP on clip.bam with control sminput.bam, genome.fa, 8 threads, and output both crosslink sites and binding regions.
Piranha basic usage
Run Piranha on deduped.bam with p-value threshold 0.01 and stranded analysis, output to peaks.bed.
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-clip-seq-clip-peak-calling description: Call protein-RNA binding site peaks from CLIP-seq data using CLIPper, PureCLIP, or Piranha. Use when identifying RBP binding sites from aligned CLIP reads. tool_type: cli primary_tool: CLIPper measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

CLIP-seq Peak Calling

CLIPper (Recommended)

# Basic peak calling
clipper \
    -b deduped.bam \
    -s hg38 \
    -o peaks.bed \
    --save-pickle

# With FDR threshold
clipper \
    -b deduped.bam \
    -s hg38 \
    -o peaks.bed \
    --FDR 0.05 \
    --superlocal

# Specify gene annotations
clipper \
    -b deduped.bam \
    -s hg38 \
    --gene genes.bed \
    -o peaks.bed

CLIPper Options

| Option | Description | |--------|-------------| | -b | Input BAM file | | -s | Species (hg38, mm10) | | -o | Output BED file | | --FDR | FDR threshold (default 0.05) | | --superlocal | Use superlocal background | | --gene | Custom gene annotation BED | | --save-pickle | Save intermediate data |

PureCLIP (HMM-Based)

PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.

# Installation
conda install -c bioconda pureclip

# Basic peak calling
pureclip \
    -i deduped.bam \
    -bai deduped.bam.bai \
    -g genome.fa \
    -o crosslink_sites.bed \
    -or binding_regions.bed \
    -nt 4

# -nt 4: Number of threads. Adjust based on CPU cores.
# -o: Single-nucleotide crosslink sites
# -or: Broader binding regions

PureCLIP Options

| Option | Description | |--------|-------------| | -i | Input BAM file | | -bai | BAM index file | | -g | Reference genome FASTA | | -o | Crosslink sites output | | -or | Binding regions output | | -nt | Number of threads | | -iv | Interval file to restrict analysis | | -dm | Min distance for merging |

PureCLIP with Input Control

# With SMInput control BAM
pureclip \
    -i clip.bam \
    -bai clip.bam.bai \
    -g genome.fa \
    -ibam sminput.bam \
    -ibai sminput.bam.bai \
    -o crosslinks.bed \
    -or regions.bed \
    -nt 8

# -ibam/-ibai: Input control BAM for background modeling

PureCLIP Output

# Crosslink sites BED contains:
# chr start end name score strand

# Score interpretation:
# Higher scores = more confident crosslink sites

# Filter by score
# score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed

PureCLIP for Different CLIP Types

# eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4 -dm 8

# iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4

# PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4

Piranha

# Basic usage
Piranha -s deduped.bam -o peaks.bed

# With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01

# Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u

# Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial

PEAKachu (for PAR-CLIP)

# PAR-CLIP specific caller
peakachu adaptive \
    -c control.bam \
    -t treatment.bam \
    -r reference.fa \
    -o peakachu_peaks.gff

MACS3 for CLIP (Alternative)

# Use narrow peak calling mode
macs3 callpeak \
    -t deduped.bam \
    -f BAM \
    -g hs \
    -n clip_peaks \
    --nomodel \
    --extsize 50 \
    -q 0.01

Strand-Specific Peak Calling

# Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam
samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam

# Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed
clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed

# Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed

Filter Peaks

# By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed

# By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed

# By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed

Merge Replicates

# Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \
    sort -u > consensus_peaks.bed

# Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \
    awk '$4 >= 2' | \
    bedtools merge > consensus_peaks.bed

Peak Metrics

import pandas as pd

def load_clip_peaks(bed_path):
    peaks = pd.read_csv(bed_path, sep='\t', header=None,
                        names=['chrom', 'start', 'end', 'name', 'score', 'strand'])
    return peaks

def peak_stats(peaks):
    stats = {
        'n_peaks': len(peaks),
        'mean_width': (peaks['end'] - peaks['start']).mean(),
        'median_score': peaks['score'].median(),
        'peaks_per_chrom': peaks.groupby('chrom').size().to_dict()
    }
    return stats

peaks = load_clip_peaks('peaks.bed')
print(peak_stats(peaks))

Quality Metrics

| Metric | Good Value | Description | |--------|------------|-------------| | Peak count | 1,000-50,000 | Depends on RBP | | Peak width | 20-100 nt | Typical for RBP footprint | | FRiP | >0.1 | Fraction reads in peaks |

Calculate FRiP

# Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)

# Total reads
total_reads=$(samtools view -c deduped.bam)

# FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc)
echo "FRiP: $frip"

Related Skills

  • clip-alignment - Generate aligned BAM
  • clip-preprocessing - UMI deduplication
  • binding-site-annotation - Annotate peaks with gene features
  • clip-motif-analysis - Find enriched motifs in peaks
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->
Related skills