Approximate Nearest Neighbours
Algorithms that quickly find similar vectors in large datasets by trading perfect accuracy for speed.
Approximate Nearest Neighbours (ANN) algorithms find vectors that are “close enough” to a query vector without checking every single item. This trade-off between accuracy and speed is essential for candidate retrieval at scale.
Why Approximate?
Exact nearest neighbour search requires comparing the query to every vector in the database. For a catalog of 10 million items, that’s 10 million comparisons per request — far too slow for real-time recommendations.
ANN algorithms use clever indexing to reduce this to thousands or even hundreds of comparisons, while still finding most of the truly nearest neighbours.
How ANN Works (Simplified)
Most ANN approaches divide the vector space into regions and only search relevant regions:
- Indexing (offline): Organise all item vectors into a searchable structure
- Querying (online): Given a query vector, quickly identify which regions to search
- Return top-K: Return the K most similar vectors found
Popular ANN Libraries
FAISS (Facebook AI Similarity Search)
Facebook’s library, widely used in production systems. Supports GPU acceleration.
import faiss
# Build index
dimension = 128
index = faiss.IndexFlatL2(dimension) # L2 distance
index.add(item_embeddings) # Add all item vectors
# Query
distances, indices = index.search(user_embedding, k=100)
ScaNN (Google)
Google’s Scalable Nearest Neighbours. Optimised for maximum inner product search.
import scann
searcher = scann.scann_ops_pybind.builder(
item_embeddings,
num_neighbors=100,
distance_measure="dot_product"
).tree(num_leaves=1000).score_ah(2).build()
neighbours, distances = searcher.search(user_embedding)
Other Options
- Annoy (Spotify) — Simple, memory-mapped, good for read-heavy workloads
- HNSW (Hierarchical Navigable Small World) — Graph-based, very fast queries
- Milvus — Open-source vector database with ANN built in
Trade-offs
| Factor | More Accurate | Faster |
|---|---|---|
| Index size | Larger | Smaller |
| Build time | Longer | Shorter |
| Query latency | Higher | Lower |
| Recall | Higher | Lower |
You tune these parameters based on your requirements. A typical production system might aim for:
- 95% recall (finds 95% of true nearest neighbours)
- <10ms query latency
- Support for 10M+ vectors
Inner Product vs L2 Distance
Two common similarity measures:
Dot Product (Inner Product)
- Higher score = more similar
- Used when embeddings are not normalised
- Common in recommendation systems
L2 Distance (Euclidean)
- Lower distance = more similar
- Used when embeddings are normalised
- Equivalent to cosine similarity for normalised vectors
Production Considerations
- Index updates: How to add new items without rebuilding the entire index
- Sharding: Splitting the index across multiple machines for scale
- Filtering: Combining ANN with business rules (e.g., only show in-stock items)
- Hybrid search: Combining vector search with keyword/attribute filters