Our review

This skill guides users through optimizing vector index parameters (e.g., HNSW, quantization) to balance latency, recall, and memory usage in production.

Strengths

Provides concrete benchmarking and validation steps
Covers parameter sweeping for HNSW and quantization techniques
Includes safety and rollback guidance for production changes

Limitations

Assumes basic familiarity with vector search concepts
Does not cover end-to-end retrieval system design beyond index tuning
Requires access to workload metrics and ground truth data for validation

When to use it

Use when you need to tune HNSW parameters, implement quantization, or scale vector search to billions of vectors with strict latency/recall goals.

When not to use it

Do not use if you only need exact search on small datasets (use a flat index) or if you lack workload metrics to validate recall.

Examples

Tune HNSW parameters

I need to tune HNSW parameters (efConstruction, M, efSearch) for a 10M vector dataset with 768-dimensional embeddings. Target: 95% recall at <50ms latency and 10GB memory budget. Help me run a parameter sweep and interpret results.

Select quantization strategy

My vector index uses 1M 512-dimensional float vectors but memory is too high. Which quantization method (e.g., PQ, SQ, binary) should I choose to reduce memory by 4x while keeping recall above 90%? Provide a benchmarking plan.

Scale index to billions

We're scaling our vector search to 1B vectors with HNSW. How should I shard the index across multiple nodes and tune efConstruction/efSearch to maintain sub-100ms latency? Include memory estimates and cost considerations.

name: vector-index-tuning description: "Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure." metadata: author: ncdevshiv version: "1.0" category: other updated: 2026-02-25 risk: unknown source: community

Vector Index Tuning

Guide to optimizing vector indexes for production performance.

Use this skill when

Tuning HNSW parameters
Implementing quantization
Optimizing memory usage
Reducing search latency
Balancing recall vs speed
Scaling to billions of vectors

Do not use this skill when

You only need exact search on small datasets (use a flat index)
You lack workload metrics or ground truth to validate recall
You need end-to-end retrieval system design beyond index tuning

Instructions

Gather workload targets (latency, recall, QPS), data size, and memory budget.
Choose an index type and establish a baseline with default parameters.
Benchmark parameter sweeps using real queries and track recall, latency, and memory.
Validate changes on a staging dataset before rolling out to production.

Refer to resources/implementation-playbook.md for detailed patterns, checklists, and templates.

Safety

Avoid reindexing in production without a rollback plan.
Validate changes under realistic load before applying globally.
Track recall regressions and revert if quality drops.

Resources

resources/implementation-playbook.md for detailed patterns, checklists, and templates.

Vector Index Tuning

Recommended for

Our review

Strengths

Limitations

Security analysis

Examples

Vector Index Tuning

Use this skill when

Do not use this skill when

Instructions

Safety

Resources

Prompt Engineering

Data Visualization

RAG Architecture Setup