Approximate Nearest Neighbours

Approximate Nearest Neighbours (ANN) algorithms find vectors that are “close enough” to a query vector without checking every single item. This trade-off between accuracy and speed is essential for candidate retrieval at scale.

Why Approximate?

Exact nearest neighbour search requires comparing the query to every vector in the database. For a catalog of 10 million items, that’s 10 million comparisons per request — far too slow for real-time recommendations.

ANN algorithms use clever indexing to reduce this to thousands or even hundreds of comparisons, while still finding most of the truly nearest neighbours.

How ANN Works (Simplified)

Most ANN approaches divide the vector space into regions and only search relevant regions:

Indexing (offline): Organise all item vectors into a searchable structure
Querying (online): Given a query vector, quickly identify which regions to search
Return top-K: Return the K most similar vectors found

Popular ANN Libraries

FAISS (Facebook AI Similarity Search)

Facebook’s library, widely used in production systems. Supports GPU acceleration.

import faiss

# Build index
dimension = 128
index = faiss.IndexFlatL2(dimension)  # L2 distance
index.add(item_embeddings)  # Add all item vectors

# Query
distances, indices = index.search(user_embedding, k=100)

ScaNN (Google)

Google’s Scalable Nearest Neighbours. Optimised for maximum inner product search.

import scann

searcher = scann.scann_ops_pybind.builder(
    item_embeddings,
    num_neighbors=100,
    distance_measure="dot_product"
).tree(num_leaves=1000).score_ah(2).build()

neighbours, distances = searcher.search(user_embedding)

Other Options

Annoy (Spotify) — Simple, memory-mapped, good for read-heavy workloads
HNSW (Hierarchical Navigable Small World) — Graph-based, very fast queries
Milvus — Open-source vector database with ANN built in

Trade-offs

Factor	More Accurate	Faster
Index size	Larger	Smaller
Build time	Longer	Shorter
Query latency	Higher	Lower
Recall	Higher	Lower

You tune these parameters based on your requirements. A typical production system might aim for:

95% recall (finds 95% of true nearest neighbours)
<10ms query latency
Support for 10M+ vectors

Inner Product vs L2 Distance

Two common similarity measures:

Dot Product (Inner Product)

Higher score = more similar
Used when embeddings are not normalised
Common in recommendation systems

L2 Distance (Euclidean)

Lower distance = more similar
Used when embeddings are normalised
Equivalent to cosine similarity for normalised vectors

Production Considerations

Index updates: How to add new items without rebuilding the entire index
Sharding: Splitting the index across multiple machines for scale
Filtering: Combining ANN with business rules (e.g., only show in-stock items)
Hybrid search: Combining vector search with keyword/attribute filters