Long-read variant calling with Clair3

VerifiedSafe

Deep learning-based variant calling from long reads using Clair3 for SNPs and small indels. Use when calling germline variants from ONT or PacBio alignments, particularly when high accuracy is needed for clinical or research applications. Supports platform-specific models, region-specific calling, gVCF output, phasing, and quality filtering.

Sby Skills Guide Bot
Data & AIIntermediate
706/2/2026
Claude Code
#long-read-sequencing#variant-calling#clair3#deep-learning

Recommended for

Our review

Calls germline variants (SNPs and small indels) from long-read sequencing alignments (ONT or PacBio HiFi) using Clair3's deep learning models.

Strengths

  • High accuracy variant calling for long reads
  • Supports both ONT and PacBio HiFi with platform-specific models
  • Includes options for region-specific calling, gVCF output, and phasing
  • Handles both human and non-human genomes

Limitations

  • Requires pre-installed Clair3 and appropriate model files
  • Not suitable for PacBio CLR (older platform)
  • Computational resources high with 32-thread recommendation
When to use it

When you need accurate variant calls from ONT or PacBio HiFi long-read sequencing data for clinical or research purposes.

When not to use it

When working with PacBio CLR data (use PEPPER-Margin-DeepVariant instead) or when short-read variant callers are sufficient.

Security analysis

Safe
Quality score80/100

The skill runs standard bioinformatics command-line tools (run_clair3.sh, bcftools) for variant calling, with no destructive, exfiltrating, or obfuscated actions. It uses run_shell_command but only in the context of legitimate data processing. No external downloads, piping to shell, or disabling of safety measures are instructed.

No concerns found

Examples

Call variants from ONT alignment
Run Clair3 to call SNPs and indels from an ONT alignment file sample.bam against reference reference.fasta with 32 threads.
Generate gVCF for joint calling
Run Clair3 with gVCF output on sample.bam for ONT data, using reference.fasta, and output to clair3_gvcf.
Call variants in targeted regions
Run Clair3 restricted to target_regions.bed from sample.bam with ont platform.
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-long-read-sequencing-clair3-variants description: Deep learning-based variant calling from long reads using Clair3 for SNPs and small indels. Use when calling germline variants from ONT or PacBio alignments, particularly when high accuracy is needed for clinical or research applications. tool_type: cli primary_tool: Clair3 measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Clair3 Variant Calling

Basic Usage

# ONT variant calling
run_clair3.sh \
    --bam_fn=sample.bam \
    --ref_fn=reference.fasta \
    --threads=32 \
    --platform=ont \
    --model_path=${CONDA_PREFIX}/bin/models/ont \
    --output=clair3_output

# PacBio HiFi variant calling
run_clair3.sh \
    --bam_fn=sample.bam \
    --ref_fn=reference.fasta \
    --threads=32 \
    --platform=hifi \
    --model_path=${CONDA_PREFIX}/bin/models/hifi \
    --output=clair3_output

# Output: clair3_output/merge_output.vcf.gz

Platform-Specific Models

| Platform | Model | Recommended Coverage | |----------|-------|---------------------| | ONT R10 | r1041_e82_400bps_sup_v430 | 30-60x | | ONT R9 | r941_prom_sup_g5014 | 30-60x | | PacBio HiFi | hifi | 20-40x | | PacBio CLR | - | Use PEPPER-Margin-DeepVariant |

# List available models
ls ${CONDA_PREFIX}/bin/models/

# Specify exact model
run_clair3.sh \
    --bam_fn=sample.bam \
    --ref_fn=reference.fasta \
    --model_path=${CONDA_PREFIX}/bin/models/r1041_e82_400bps_sup_v430 \
    --output=clair3_out \
    --threads=32

Key Parameters

| Parameter | Description | |-----------|-------------| | --platform | ont, hifi, or ilmn | | --model_path | Path to trained model | | --bed_fn | Restrict calling to regions | | --include_all_ctgs | Call on all contigs (not just chr1-22,X,Y) | | --no_phasing_for_fa | Disable phasing | | --gvcf | Output gVCF format | | --qual | Minimum variant quality (default: 2) |

Region-Specific Calling

# Call variants in specific regions
run_clair3.sh \
    --bam_fn=sample.bam \
    --ref_fn=reference.fasta \
    --bed_fn=target_regions.bed \
    --threads=32 \
    --platform=ont \
    --model_path=${CONDA_PREFIX}/bin/models/ont \
    --output=clair3_targeted

# Call on non-human genomes (all contigs)
run_clair3.sh \
    --bam_fn=sample.bam \
    --ref_fn=reference.fasta \
    --include_all_ctgs \
    --threads=32 \
    --platform=hifi \
    --model_path=${CONDA_PREFIX}/bin/models/hifi \
    --output=clair3_all_contigs

gVCF Output

# Generate gVCF for joint calling
run_clair3.sh \
    --bam_fn=sample.bam \
    --ref_fn=reference.fasta \
    --gvcf \
    --threads=32 \
    --platform=ont \
    --model_path=${CONDA_PREFIX}/bin/models/ont \
    --output=clair3_gvcf

# Joint genotyping multiple samples
bcftools merge sample1.g.vcf.gz sample2.g.vcf.gz -o cohort.vcf.gz

Phased Variant Calling

# With phasing information (requires haplotagged BAM)
run_clair3.sh \
    --bam_fn=haplotagged.bam \
    --ref_fn=reference.fasta \
    --enable_phasing \
    --longphase_for_phasing \
    --threads=32 \
    --platform=ont \
    --model_path=${CONDA_PREFIX}/bin/models/ont \
    --output=clair3_phased

Quality Filtering

# Filter by quality score
bcftools view -i 'QUAL>20' clair3_output/merge_output.vcf.gz -Oz -o filtered.vcf.gz

# Filter by genotype quality
bcftools view -i 'GQ>30' clair3_output/merge_output.vcf.gz -Oz -o high_gq.vcf.gz

# SNPs only
bcftools view -v snps clair3_output/merge_output.vcf.gz -Oz -o snps.vcf.gz

# Indels only
bcftools view -v indels clair3_output/merge_output.vcf.gz -Oz -o indels.vcf.gz

Python Wrapper

import subprocess
from pathlib import Path

def run_clair3(bam, reference, output_dir, platform='ont', model_path=None,
               threads=32, bed=None, gvcf=False, include_all_ctgs=False):
    if model_path is None:
        import os
        conda_prefix = os.environ.get('CONDA_PREFIX', '')
        model_path = f'{conda_prefix}/bin/models/{platform}'

    cmd = [
        'run_clair3.sh',
        f'--bam_fn={bam}',
        f'--ref_fn={reference}',
        f'--threads={threads}',
        f'--platform={platform}',
        f'--model_path={model_path}',
        f'--output={output_dir}'
    ]

    if bed:
        cmd.append(f'--bed_fn={bed}')
    if gvcf:
        cmd.append('--gvcf')
    if include_all_ctgs:
        cmd.append('--include_all_ctgs')

    subprocess.run(cmd, check=True)
    return Path(output_dir) / 'merge_output.vcf.gz'

def filter_variants(vcf, output, min_qual=20, variant_type=None):
    cmd = ['bcftools', 'view', '-i', f'QUAL>{min_qual}']
    if variant_type:
        cmd.extend(['-v', variant_type])
    cmd.extend([vcf, '-Oz', '-o', output])
    subprocess.run(cmd, check=True)
    subprocess.run(['bcftools', 'index', '-t', output], check=True)
    return output

# Example
vcf = run_clair3('sample.bam', 'ref.fa', 'clair3_out', platform='hifi', threads=48)
snps = filter_variants(str(vcf), 'snps_q20.vcf.gz', min_qual=20, variant_type='snps')

Comparison with Other Callers

| Caller | Best For | Speed | Accuracy | |--------|----------|-------|----------| | Clair3 | ONT/HiFi germline | Fast | High | | DeepVariant | HiFi, Illumina | Medium | Very high | | PEPPER-DV | ONT (integrated) | Slow | Very high | | Longshot | ONT SNPs | Fast | Good |

Troubleshooting

| Issue | Solution | |-------|----------| | Missing model | Download from Clair3 releases or use conda models | | Low call rate | Check coverage; increase --qual threshold | | Slow performance | Reduce --threads or use --bed_fn for targeted calling | | Wrong variants on non-human | Use --include_all_ctgs |

Docker Usage

# Using Docker
docker run -v /data:/data \
    hkubal/clair3:latest \
    /opt/bin/run_clair3.sh \
    --bam_fn=/data/sample.bam \
    --ref_fn=/data/reference.fasta \
    --threads=32 \
    --platform=ont \
    --model_path=/opt/models/ont \
    --output=/data/clair3_output

# Singularity
singularity exec clair3.sif run_clair3.sh \
    --bam_fn=sample.bam \
    --ref_fn=reference.fasta \
    --threads=32 \
    --platform=ont \
    --model_path=/opt/models/ont \
    --output=clair3_output

Related Skills

  • variant-calling/bcftools-basics - VCF manipulation
  • variant-calling/filtering-best-practices - Quality filtering
  • long-read-sequencing/long-read-qc - Input quality control
  • long-read-sequencing/long-read-alignment - Mapping with minimap2
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->
Related skills