Notre avis
Cette compétence permet d'identifier les sites de liaison protéine-ARN à partir de données CLIP-seq en utilisant des outils comme CLIPper, PureCLIP ou Piranha.
Points forts
- Prend en charge plusieurs outils de pointe pour l'appel de pics (CLIPper, PureCLIP, Piranha).
- Fournit des commandes détaillées et des explications sur les options clés.
- Propose des adaptations pour différents protocoles CLIP (eCLIP, iCLIP, PAR-CLIP).
- Inclut des filtres de qualité comme le FDR ou les scores de confiance.
Limites
- Nécessite que les outils (CLIPper, PureCLIP, etc.) soient déjà installés dans l'environnement.
- Ne couvre pas l'analyse en aval (motifs, annotations fonctionnelles).
- Les paramètres par défaut peuvent ne pas convenir à tous les jeux de données.
Utilisez cette compétence lorsque vous avez des données CLIP-seq alignées (BAM) et que vous souhaitez identifier les sites de liaison d'une protéine de liaison à l'ARN.
Évitez cette compétence si vous travaillez avec des données RNA-seq standard (non liées aux interactions ARN-protéine) ou si vous avez besoin d'une analyse complète incluant l'annotation fonctionnelle des pics.
Analyse de sécurité
SûrThe skill provides commands for standard bioinformatics tools (CLIPper, PureCLIP, etc.) for peak calling on CLIP-seq data. There are no destructive, exfiltration, or obfuscated commands, and no use of curl, sh, or similar risky patterns.
Aucun point d'attention détecté
Exemples
Run CLIPper on deduped.bam for human genome hg38, output peaks.bed with FDR 0.05 and superlocal background.Call crosslink sites using PureCLIP on clip.bam with control sminput.bam, genome.fa, 8 threads, and output both crosslink sites and binding regions.Run Piranha on deduped.bam with p-value threshold 0.01 and stranded analysis, output to peaks.bed.name: bio-clip-seq-clip-peak-calling description: Call protein-RNA binding site peaks from CLIP-seq data using CLIPper, PureCLIP, or Piranha. Use when identifying RBP binding sites from aligned CLIP reads. tool_type: cli primary_tool: CLIPper measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
CLIP-seq Peak Calling
CLIPper (Recommended)
# Basic peak calling
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--save-pickle
# With FDR threshold
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--FDR 0.05 \
--superlocal
# Specify gene annotations
clipper \
-b deduped.bam \
-s hg38 \
--gene genes.bed \
-o peaks.bed
CLIPper Options
| Option | Description | |--------|-------------| | -b | Input BAM file | | -s | Species (hg38, mm10) | | -o | Output BED file | | --FDR | FDR threshold (default 0.05) | | --superlocal | Use superlocal background | | --gene | Custom gene annotation BED | | --save-pickle | Save intermediate data |
PureCLIP (HMM-Based)
PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.
# Installation
conda install -c bioconda pureclip
# Basic peak calling
pureclip \
-i deduped.bam \
-bai deduped.bam.bai \
-g genome.fa \
-o crosslink_sites.bed \
-or binding_regions.bed \
-nt 4
# -nt 4: Number of threads. Adjust based on CPU cores.
# -o: Single-nucleotide crosslink sites
# -or: Broader binding regions
PureCLIP Options
| Option | Description | |--------|-------------| | -i | Input BAM file | | -bai | BAM index file | | -g | Reference genome FASTA | | -o | Crosslink sites output | | -or | Binding regions output | | -nt | Number of threads | | -iv | Interval file to restrict analysis | | -dm | Min distance for merging |
PureCLIP with Input Control
# With SMInput control BAM
pureclip \
-i clip.bam \
-bai clip.bam.bai \
-g genome.fa \
-ibam sminput.bam \
-ibai sminput.bam.bai \
-o crosslinks.bed \
-or regions.bed \
-nt 8
# -ibam/-ibai: Input control BAM for background modeling
PureCLIP Output
# Crosslink sites BED contains:
# chr start end name score strand
# Score interpretation:
# Higher scores = more confident crosslink sites
# Filter by score
# score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed
PureCLIP for Different CLIP Types
# eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4 -dm 8
# iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
# PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
Piranha
# Basic usage
Piranha -s deduped.bam -o peaks.bed
# With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01
# Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u
# Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial
PEAKachu (for PAR-CLIP)
# PAR-CLIP specific caller
peakachu adaptive \
-c control.bam \
-t treatment.bam \
-r reference.fa \
-o peakachu_peaks.gff
MACS3 for CLIP (Alternative)
# Use narrow peak calling mode
macs3 callpeak \
-t deduped.bam \
-f BAM \
-g hs \
-n clip_peaks \
--nomodel \
--extsize 50 \
-q 0.01
Strand-Specific Peak Calling
# Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam
samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam
# Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed
clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed
# Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed
Filter Peaks
# By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed
# By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed
# By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed
Merge Replicates
# Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \
sort -u > consensus_peaks.bed
# Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \
awk '$4 >= 2' | \
bedtools merge > consensus_peaks.bed
Peak Metrics
import pandas as pd
def load_clip_peaks(bed_path):
peaks = pd.read_csv(bed_path, sep='\t', header=None,
names=['chrom', 'start', 'end', 'name', 'score', 'strand'])
return peaks
def peak_stats(peaks):
stats = {
'n_peaks': len(peaks),
'mean_width': (peaks['end'] - peaks['start']).mean(),
'median_score': peaks['score'].median(),
'peaks_per_chrom': peaks.groupby('chrom').size().to_dict()
}
return stats
peaks = load_clip_peaks('peaks.bed')
print(peak_stats(peaks))
Quality Metrics
| Metric | Good Value | Description | |--------|------------|-------------| | Peak count | 1,000-50,000 | Depends on RBP | | Peak width | 20-100 nt | Typical for RBP footprint | | FRiP | >0.1 | Fraction reads in peaks |
Calculate FRiP
# Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)
# Total reads
total_reads=$(samtools view -c deduped.bam)
# FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc)
echo "FRiP: $frip"
Related Skills
- clip-alignment - Generate aligned BAM
- clip-preprocessing - UMI deduplication
- binding-site-annotation - Annotate peaks with gene features
- clip-motif-analysis - Find enriched motifs in peaks
Ingénierie de Prompts
Data & IA
Bonnes pratiques et templates de prompt engineering pour maximiser les résultats IA.
Visualisation de Données
Data & IA
Génère des visualisations de données et graphiques adaptés à vos données.
Architecture RAG
Data & IA
Guide de configuration d'architectures RAG (Retrieval-Augmented Generation).