Long-Read Structural Variant Detection Pipeline

VerifiedCaution

End-to-end workflow for detecting structural variants from long-read sequencing data. Covers ONT/PacBio alignment with minimap2 and SV calling with Sniffles or cuteSV. Use when detecting structural variants from long reads.

Sby Skills Guide Bot
Data & AIIntermediate
806/2/2026
Claude Code
#long-read-sequencing#structural-variants#sniffles#bioinformatics-pipeline

Recommended for

Our review

End-to-end workflow for detecting structural variants from long-read sequencing data (ONT/PacBio), covering minimap2 alignment and SV calling with Sniffles or cuteSV.

Strengths

  • Integrates QC, alignment, variant calling, and filtering in one pipeline
  • Offers two calling options (Sniffles and cuteSV) for flexibility
  • Includes quantifiable QC checkpoints at each major step
  • Optional annotation with AnnotSV for clinical interpretation

Limitations

  • Requires good command-line bioinformatics knowledge
  • Performance depends on reference quality and coverage (>10x recommended)
  • Optional annotation step is not fully detailed
When to use it

Use this workflow when you have long-read data (ONT or PacBio) and need to detect structural variants such as deletions, insertions, duplications, and inversions from single or multiple samples.

When not to use it

Do not use for short-read sequencing data or for detecting small variants (SNVs/indels), where tools like GATK are more suitable.

Security analysis

Caution
Quality score95/100

The skill uses shell commands to execute bioinformatics tools (minimap2, Sniffles, etc.), but all commands are legitimate and non-destructive. No network exfiltration or system compromise risk detected. Risk is limited to resource usage.

No concerns found

Examples

Full ONT SV detection
I have ONT long reads in reads.fastq.gz and a reference genome hg38.fa. Run the complete structural variant pipeline: QC with NanoPlot, align with minimap2 (map-ont), call SVs with Sniffles2 (minimum SV length 50), filter by QUAL>=20 and SVLEN>=50, and produce a final filtered VCF.
Alternative cuteSV pipeline
I have PacBio HiFi reads aligned to hg38 (aligned.bam). Use cuteSV to call structural variants with min_size 50, genotype, and output a compressed VCF. Filter to keep only homozygous deletions with SVLEN>=100.
Multi-sample SV calling with Sniffles
I have a list of BAM files (samples*.bam) from ONT long reads. Perform multi-sample SV calling using Sniffles in population mode, then merge and filter by quality and size.
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-workflows-longread-sv-pipeline description: End-to-end workflow for detecting structural variants from long-read sequencing data. Covers ONT/PacBio alignment with minimap2 and SV calling with Sniffles or cuteSV. Use when detecting structural variants from long reads. tool_type: cli primary_tool: Sniffles workflow: true depends_on:

  • long-read-sequencing/long-read-alignment
  • long-read-sequencing/long-read-qc
  • long-read-sequencing/structural-variants qc_checkpoints:
  • after_qc: "Read N50 >10kb, quality score >Q10"
  • after_alignment: "Mapping rate >90%, coverage sufficient"
  • after_calling: "SV count reasonable, genotypes concordant" measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
  • read_file
  • run_shell_command

Long-Read SV Pipeline

Complete workflow for detecting structural variants from ONT or PacBio long-read data.

Workflow Overview

Long reads (ONT/PacBio)
    |
    v
[1. QC] ----------------> NanoPlot
    |
    v
[2. Alignment] ---------> minimap2
    |
    v
[3. SV Calling] --------> Sniffles / cuteSV
    |
    v
[4. Filtering] ---------> bcftools
    |
    v
[5. Annotation] --------> AnnotSV (optional)
    |
    v
Filtered SV VCF

Primary Path: minimap2 + Sniffles

Step 1: Quality Control

# ONT reads QC
NanoPlot --fastq reads.fastq.gz \
    --outdir nanoplot_output \
    --threads 8

# Check key metrics
# - Read N50 should be >10kb
# - Mean quality >Q10
# - Total bases sufficient for coverage

Step 2: Alignment with minimap2

# ONT reads
minimap2 -ax map-ont \
    -t 16 \
    --MD \
    -Y \
    reference.fa \
    reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam

samtools index aligned.bam

# PacBio HiFi
minimap2 -ax map-hifi \
    -t 16 \
    --MD \
    -Y \
    reference.fa \
    reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam

# PacBio CLR
minimap2 -ax map-pb \
    -t 16 \
    --MD \
    -Y \
    reference.fa \
    reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam

QC Checkpoint: Check alignment stats

samtools flagstat aligned.bam
samtools depth -a aligned.bam | awk '{sum+=$3} END {print "Average coverage:",sum/NR}'
  • Mapping rate >90%
  • Average coverage >10x for SV calling (>20x preferred)

Step 3: SV Calling with Sniffles

# Sniffles2 (recommended)
sniffles \
    --input aligned.bam \
    --vcf svs.vcf.gz \
    --reference reference.fa \
    --threads 8 \
    --minsvlen 50

# With tandem repeat annotations (recommended)
sniffles \
    --input aligned.bam \
    --vcf svs.vcf.gz \
    --reference reference.fa \
    --tandem-repeats tandem_repeats.bed \
    --threads 8

Alternative: cuteSV

# cuteSV (faster, good for ONT)
cuteSV \
    aligned.bam \
    reference.fa \
    svs.vcf \
    work_dir/ \
    --threads 8 \
    --min_size 50 \
    --genotype

bgzip svs.vcf
tabix svs.vcf.gz

Step 4: Filtering

# Filter by quality and size
bcftools view -i 'QUAL>=20 && ABS(SVLEN)>=50' svs.vcf.gz -Oz -o svs.filtered.vcf.gz

# Filter by SV type
bcftools view -i 'SVTYPE="DEL" || SVTYPE="INS"' svs.filtered.vcf.gz -Oz -o del_ins.vcf.gz

# Filter by genotype
bcftools view -i 'GT="1/1" || GT="0/1"' svs.filtered.vcf.gz -Oz -o genotyped.vcf.gz

# Stats
bcftools stats svs.filtered.vcf.gz > sv_stats.txt

Step 5: Annotation (Optional)

# AnnotSV for gene/clinical annotations
AnnotSV -SVinputFile svs.filtered.vcf.gz \
    -outputFile annotated_svs \
    -genomeBuild GRCh38

Multi-Sample SV Calling

# Call SVs per sample
for sample in sample1 sample2 sample3; do
    sniffles --input ${sample}.bam \
        --snf ${sample}.snf \
        --reference reference.fa
done

# Merge and joint genotype
sniffles --input sample1.snf sample2.snf sample3.snf \
    --vcf merged_svs.vcf.gz \
    --reference reference.fa

Parameter Recommendations

| Tool | Parameter | ONT | PacBio HiFi | |------|-----------|-----|-------------| | minimap2 | -ax | map-ont | map-hifi | | Sniffles | --minsvlen | 50 | 50 | | Sniffles | --minsupport | auto | auto | | cuteSV | --min_size | 50 | 50 | | cuteSV | --min_support | 3 | 3 |

SV Types Detected

| Type | Abbreviation | Description | |------|--------------|-------------| | Deletion | DEL | Sequence removed | | Insertion | INS | Sequence added | | Duplication | DUP | Sequence copied | | Inversion | INV | Sequence reversed | | Translocation | BND | Breakend (interchromosomal) |

Troubleshooting

| Issue | Likely Cause | Solution | |-------|--------------|----------| | Few SVs | Low coverage | Increase sequencing depth | | Many false positives | Low quality reads | Filter by QUAL, increase min support | | Missing known SV | Repeat region | Use tandem repeat annotations | | High breakend count | Mapping artifacts | Check alignment quality |

Complete Pipeline Script

#!/bin/bash
set -e

THREADS=16
READS="reads.fastq.gz"
REF="reference.fa"
SAMPLE="sample1"
OUTDIR="sv_results"

mkdir -p ${OUTDIR}/{qc,aligned,sv}

# Step 1: QC
echo "=== QC ==="
NanoPlot --fastq ${READS} --outdir ${OUTDIR}/qc -t ${THREADS}

# Step 2: Alignment
echo "=== Alignment ==="
minimap2 -ax map-ont -t ${THREADS} --MD -Y ${REF} ${READS} | \
    samtools sort -@ 4 -o ${OUTDIR}/aligned/${SAMPLE}.bam
samtools index ${OUTDIR}/aligned/${SAMPLE}.bam

echo "Alignment stats:"
samtools flagstat ${OUTDIR}/aligned/${SAMPLE}.bam

# Step 3: SV calling
echo "=== SV Calling ==="
sniffles --input ${OUTDIR}/aligned/${SAMPLE}.bam \
    --vcf ${OUTDIR}/sv/${SAMPLE}.vcf.gz \
    --reference ${REF} \
    --threads ${THREADS}

# Step 4: Filter
echo "=== Filtering ==="
bcftools view -i 'QUAL>=20' ${OUTDIR}/sv/${SAMPLE}.vcf.gz \
    -Oz -o ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz
bcftools index ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz

# Stats
bcftools stats ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz > ${OUTDIR}/sv/stats.txt

echo "=== Complete ==="
echo "SVs: $(bcftools view -H ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz | wc -l)"

Related Skills

  • long-read-sequencing/long-read-alignment - minimap2 details
  • long-read-sequencing/structural-variants - Sniffles, cuteSV options
  • long-read-sequencing/long-read-qc - NanoPlot metrics
  • variant-calling/structural-variant-calling - Short-read SV methods
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->
Related skills