Détection de pics CLIP-seq

VérifiéSûr

Cette compétence identifie les sites de liaison protéine-ARN à partir de données CLIP-seq alignées en utilisant des outils comme CLIPper, PureCLIP ou Piranha. Elle est utile après l'alignement des lectures pour localiser les pics de liaison des protéines de liaison à l'ARN (RBP) avant les analyses en aval.

Spar Skills Guide Bot
Data & IAIntermédiaire
8002/06/2026
Claude Code
#bioinformatics#clip-seq#peak-calling#rna-binding#genomics

Recommandé pour

Notre avis

Cette compétence permet d'identifier les sites de liaison protéine-ARN à partir de données CLIP-seq en utilisant des outils comme CLIPper, PureCLIP ou Piranha.

Points forts

  • Prend en charge plusieurs outils de pointe pour l'appel de pics (CLIPper, PureCLIP, Piranha).
  • Fournit des commandes détaillées et des explications sur les options clés.
  • Propose des adaptations pour différents protocoles CLIP (eCLIP, iCLIP, PAR-CLIP).
  • Inclut des filtres de qualité comme le FDR ou les scores de confiance.

Limites

  • Nécessite que les outils (CLIPper, PureCLIP, etc.) soient déjà installés dans l'environnement.
  • Ne couvre pas l'analyse en aval (motifs, annotations fonctionnelles).
  • Les paramètres par défaut peuvent ne pas convenir à tous les jeux de données.
Quand l'utiliser

Utilisez cette compétence lorsque vous avez des données CLIP-seq alignées (BAM) et que vous souhaitez identifier les sites de liaison d'une protéine de liaison à l'ARN.

Quand l'éviter

Évitez cette compétence si vous travaillez avec des données RNA-seq standard (non liées aux interactions ARN-protéine) ou si vous avez besoin d'une analyse complète incluant l'annotation fonctionnelle des pics.

Analyse de sécurité

Sûr
Score qualité90/100

The skill provides commands for standard bioinformatics tools (CLIPper, PureCLIP, etc.) for peak calling on CLIP-seq data. There are no destructive, exfiltration, or obfuscated commands, and no use of curl, sh, or similar risky patterns.

Aucun point d'attention détecté

Exemples

Basic CLIPper peak calling
Run CLIPper on deduped.bam for human genome hg38, output peaks.bed with FDR 0.05 and superlocal background.
PureCLIP with input control
Call crosslink sites using PureCLIP on clip.bam with control sminput.bam, genome.fa, 8 threads, and output both crosslink sites and binding regions.
Piranha basic usage
Run Piranha on deduped.bam with p-value threshold 0.01 and stranded analysis, output to peaks.bed.
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-clip-seq-clip-peak-calling description: Call protein-RNA binding site peaks from CLIP-seq data using CLIPper, PureCLIP, or Piranha. Use when identifying RBP binding sites from aligned CLIP reads. tool_type: cli primary_tool: CLIPper measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

CLIP-seq Peak Calling

CLIPper (Recommended)

# Basic peak calling
clipper \
    -b deduped.bam \
    -s hg38 \
    -o peaks.bed \
    --save-pickle

# With FDR threshold
clipper \
    -b deduped.bam \
    -s hg38 \
    -o peaks.bed \
    --FDR 0.05 \
    --superlocal

# Specify gene annotations
clipper \
    -b deduped.bam \
    -s hg38 \
    --gene genes.bed \
    -o peaks.bed

CLIPper Options

| Option | Description | |--------|-------------| | -b | Input BAM file | | -s | Species (hg38, mm10) | | -o | Output BED file | | --FDR | FDR threshold (default 0.05) | | --superlocal | Use superlocal background | | --gene | Custom gene annotation BED | | --save-pickle | Save intermediate data |

PureCLIP (HMM-Based)

PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.

# Installation
conda install -c bioconda pureclip

# Basic peak calling
pureclip \
    -i deduped.bam \
    -bai deduped.bam.bai \
    -g genome.fa \
    -o crosslink_sites.bed \
    -or binding_regions.bed \
    -nt 4

# -nt 4: Number of threads. Adjust based on CPU cores.
# -o: Single-nucleotide crosslink sites
# -or: Broader binding regions

PureCLIP Options

| Option | Description | |--------|-------------| | -i | Input BAM file | | -bai | BAM index file | | -g | Reference genome FASTA | | -o | Crosslink sites output | | -or | Binding regions output | | -nt | Number of threads | | -iv | Interval file to restrict analysis | | -dm | Min distance for merging |

PureCLIP with Input Control

# With SMInput control BAM
pureclip \
    -i clip.bam \
    -bai clip.bam.bai \
    -g genome.fa \
    -ibam sminput.bam \
    -ibai sminput.bam.bai \
    -o crosslinks.bed \
    -or regions.bed \
    -nt 8

# -ibam/-ibai: Input control BAM for background modeling

PureCLIP Output

# Crosslink sites BED contains:
# chr start end name score strand

# Score interpretation:
# Higher scores = more confident crosslink sites

# Filter by score
# score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed

PureCLIP for Different CLIP Types

# eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4 -dm 8

# iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4

# PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4

Piranha

# Basic usage
Piranha -s deduped.bam -o peaks.bed

# With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01

# Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u

# Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial

PEAKachu (for PAR-CLIP)

# PAR-CLIP specific caller
peakachu adaptive \
    -c control.bam \
    -t treatment.bam \
    -r reference.fa \
    -o peakachu_peaks.gff

MACS3 for CLIP (Alternative)

# Use narrow peak calling mode
macs3 callpeak \
    -t deduped.bam \
    -f BAM \
    -g hs \
    -n clip_peaks \
    --nomodel \
    --extsize 50 \
    -q 0.01

Strand-Specific Peak Calling

# Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam
samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam

# Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed
clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed

# Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed

Filter Peaks

# By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed

# By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed

# By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed

Merge Replicates

# Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \
    sort -u > consensus_peaks.bed

# Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \
    awk '$4 >= 2' | \
    bedtools merge > consensus_peaks.bed

Peak Metrics

import pandas as pd

def load_clip_peaks(bed_path):
    peaks = pd.read_csv(bed_path, sep='\t', header=None,
                        names=['chrom', 'start', 'end', 'name', 'score', 'strand'])
    return peaks

def peak_stats(peaks):
    stats = {
        'n_peaks': len(peaks),
        'mean_width': (peaks['end'] - peaks['start']).mean(),
        'median_score': peaks['score'].median(),
        'peaks_per_chrom': peaks.groupby('chrom').size().to_dict()
    }
    return stats

peaks = load_clip_peaks('peaks.bed')
print(peak_stats(peaks))

Quality Metrics

| Metric | Good Value | Description | |--------|------------|-------------| | Peak count | 1,000-50,000 | Depends on RBP | | Peak width | 20-100 nt | Typical for RBP footprint | | FRiP | >0.1 | Fraction reads in peaks |

Calculate FRiP

# Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)

# Total reads
total_reads=$(samtools view -c deduped.bam)

# FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc)
echo "FRiP: $frip"

Related Skills

  • clip-alignment - Generate aligned BAM
  • clip-preprocessing - UMI deduplication
  • binding-site-annotation - Annotate peaks with gene features
  • clip-motif-analysis - Find enriched motifs in peaks
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->
Skills similaires