Approximate Nearest Neighbours

Algorithms that quickly find similar vectors in large datasets by trading perfect accuracy for speed.

Approximate Nearest Neighbours (ANN) algorithms find vectors that are “close enough” to a query vector without checking every single item. This trade-off between accuracy and speed is essential for candidate retrieval at scale.

Why Approximate?

Exact nearest neighbour search requires comparing the query to every vector in the database. For a catalog of 10 million items, that’s 10 million comparisons per request — far too slow for real-time recommendations.

ANN algorithms use clever indexing to reduce this to thousands or even hundreds of comparisons, while still finding most of the truly nearest neighbours.

How ANN Works (Simplified)

Most ANN approaches divide the vector space into regions and only search relevant regions:

  1. Indexing (offline): Organise all item vectors into a searchable structure
  2. Querying (online): Given a query vector, quickly identify which regions to search
  3. Return top-K: Return the K most similar vectors found

Facebook’s library, widely used in production systems. Supports GPU acceleration.

import faiss

# Build index
dimension = 128
index = faiss.IndexFlatL2(dimension)  # L2 distance
index.add(item_embeddings)  # Add all item vectors

# Query
distances, indices = index.search(user_embedding, k=100)

ScaNN (Google)

Google’s Scalable Nearest Neighbours. Optimised for maximum inner product search.

import scann

searcher = scann.scann_ops_pybind.builder(
    item_embeddings,
    num_neighbors=100,
    distance_measure="dot_product"
).tree(num_leaves=1000).score_ah(2).build()

neighbours, distances = searcher.search(user_embedding)

Other Options

  • Annoy (Spotify) — Simple, memory-mapped, good for read-heavy workloads
  • HNSW (Hierarchical Navigable Small World) — Graph-based, very fast queries
  • Milvus — Open-source vector database with ANN built in

Trade-offs

FactorMore AccurateFaster
Index sizeLargerSmaller
Build timeLongerShorter
Query latencyHigherLower
RecallHigherLower

You tune these parameters based on your requirements. A typical production system might aim for:

  • 95% recall (finds 95% of true nearest neighbours)
  • <10ms query latency
  • Support for 10M+ vectors

Inner Product vs L2 Distance

Two common similarity measures:

Dot Product (Inner Product)

  • Higher score = more similar
  • Used when embeddings are not normalised
  • Common in recommendation systems

L2 Distance (Euclidean)

  • Lower distance = more similar
  • Used when embeddings are normalised
  • Equivalent to cosine similarity for normalised vectors

Production Considerations

  • Index updates: How to add new items without rebuilding the entire index
  • Sharding: Splitting the index across multiple machines for scale
  • Filtering: Combining ANN with business rules (e.g., only show in-stock items)
  • Hybrid search: Combining vector search with keyword/attribute filters

See Also

-
-