Our review
This skill calls protein-RNA binding site peaks from CLIP-seq data using tools such as CLIPper, PureCLIP, or Piranha.
Strengths
- Supports multiple leading peak callers (CLIPper, PureCLIP, Piranha).
- Provides detailed commands and explanations of key options.
- Offers adaptations for different CLIP protocols (eCLIP, iCLIP, PAR-CLIP).
- Includes quality filters like FDR or confidence scores.
Limitations
- Requires the tools (CLIPper, PureCLIP, etc.) to be pre-installed in the environment.
- Does not cover downstream analysis (motifs, functional annotations).
- Default parameters may not suit all datasets.
Use this skill when you have aligned CLIP-seq data (BAM files) and want to identify binding sites of an RNA-binding protein.
Avoid this skill if you are working with standard RNA-seq data (not RNA-protein interactions) or if you need a full analysis pipeline including functional annotation of peaks.
Security analysis
SafeThe skill provides commands for standard bioinformatics tools (CLIPper, PureCLIP, etc.) for peak calling on CLIP-seq data. There are no destructive, exfiltration, or obfuscated commands, and no use of curl, sh, or similar risky patterns.
No concerns found
Examples
Run CLIPper on deduped.bam for human genome hg38, output peaks.bed with FDR 0.05 and superlocal background.Call crosslink sites using PureCLIP on clip.bam with control sminput.bam, genome.fa, 8 threads, and output both crosslink sites and binding regions.Run Piranha on deduped.bam with p-value threshold 0.01 and stranded analysis, output to peaks.bed.name: bio-clip-seq-clip-peak-calling description: Call protein-RNA binding site peaks from CLIP-seq data using CLIPper, PureCLIP, or Piranha. Use when identifying RBP binding sites from aligned CLIP reads. tool_type: cli primary_tool: CLIPper measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
CLIP-seq Peak Calling
CLIPper (Recommended)
# Basic peak calling
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--save-pickle
# With FDR threshold
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--FDR 0.05 \
--superlocal
# Specify gene annotations
clipper \
-b deduped.bam \
-s hg38 \
--gene genes.bed \
-o peaks.bed
CLIPper Options
| Option | Description | |--------|-------------| | -b | Input BAM file | | -s | Species (hg38, mm10) | | -o | Output BED file | | --FDR | FDR threshold (default 0.05) | | --superlocal | Use superlocal background | | --gene | Custom gene annotation BED | | --save-pickle | Save intermediate data |
PureCLIP (HMM-Based)
PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.
# Installation
conda install -c bioconda pureclip
# Basic peak calling
pureclip \
-i deduped.bam \
-bai deduped.bam.bai \
-g genome.fa \
-o crosslink_sites.bed \
-or binding_regions.bed \
-nt 4
# -nt 4: Number of threads. Adjust based on CPU cores.
# -o: Single-nucleotide crosslink sites
# -or: Broader binding regions
PureCLIP Options
| Option | Description | |--------|-------------| | -i | Input BAM file | | -bai | BAM index file | | -g | Reference genome FASTA | | -o | Crosslink sites output | | -or | Binding regions output | | -nt | Number of threads | | -iv | Interval file to restrict analysis | | -dm | Min distance for merging |
PureCLIP with Input Control
# With SMInput control BAM
pureclip \
-i clip.bam \
-bai clip.bam.bai \
-g genome.fa \
-ibam sminput.bam \
-ibai sminput.bam.bai \
-o crosslinks.bed \
-or regions.bed \
-nt 8
# -ibam/-ibai: Input control BAM for background modeling
PureCLIP Output
# Crosslink sites BED contains:
# chr start end name score strand
# Score interpretation:
# Higher scores = more confident crosslink sites
# Filter by score
# score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed
PureCLIP for Different CLIP Types
# eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4 -dm 8
# iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
# PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
Piranha
# Basic usage
Piranha -s deduped.bam -o peaks.bed
# With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01
# Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u
# Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial
PEAKachu (for PAR-CLIP)
# PAR-CLIP specific caller
peakachu adaptive \
-c control.bam \
-t treatment.bam \
-r reference.fa \
-o peakachu_peaks.gff
MACS3 for CLIP (Alternative)
# Use narrow peak calling mode
macs3 callpeak \
-t deduped.bam \
-f BAM \
-g hs \
-n clip_peaks \
--nomodel \
--extsize 50 \
-q 0.01
Strand-Specific Peak Calling
# Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam
samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam
# Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed
clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed
# Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed
Filter Peaks
# By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed
# By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed
# By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed
Merge Replicates
# Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \
sort -u > consensus_peaks.bed
# Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \
awk '$4 >= 2' | \
bedtools merge > consensus_peaks.bed
Peak Metrics
import pandas as pd
def load_clip_peaks(bed_path):
peaks = pd.read_csv(bed_path, sep='\t', header=None,
names=['chrom', 'start', 'end', 'name', 'score', 'strand'])
return peaks
def peak_stats(peaks):
stats = {
'n_peaks': len(peaks),
'mean_width': (peaks['end'] - peaks['start']).mean(),
'median_score': peaks['score'].median(),
'peaks_per_chrom': peaks.groupby('chrom').size().to_dict()
}
return stats
peaks = load_clip_peaks('peaks.bed')
print(peak_stats(peaks))
Quality Metrics
| Metric | Good Value | Description | |--------|------------|-------------| | Peak count | 1,000-50,000 | Depends on RBP | | Peak width | 20-100 nt | Typical for RBP footprint | | FRiP | >0.1 | Fraction reads in peaks |
Calculate FRiP
# Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)
# Total reads
total_reads=$(samtools view -c deduped.bam)
# FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc)
echo "FRiP: $frip"
Related Skills
- clip-alignment - Generate aligned BAM
- clip-preprocessing - UMI deduplication
- binding-site-annotation - Annotate peaks with gene features
- clip-motif-analysis - Find enriched motifs in peaks
Prompt Engineering
Data & AI
Prompt engineering best practices and templates to maximize AI outputs.
Data Visualization
Data & AI
Generates data visualizations and charts tailored to your data.
RAG Architecture Setup
Data & AI
Setup guide for RAG (Retrieval-Augmented Generation) architectures.