Candidate Retrieval

The first stage of a recommender system that quickly narrows millions of items down to a smaller set for ranking.

Candidate retrieval (also called candidate generation) is the first stage in a two-stage recommender system. Its job is to quickly filter millions of items down to a manageable set (typically hundreds or thousands) that a more expensive ranking model can score.

Why Separate Retrieval from Ranking?

Scoring every item in a large catalog with a complex model is too slow for real-time requests. Retrieval acts as a fast, coarse filter:

  • Retrieval: Fast, less precise, handles millions of items
  • Ranking: Slower, more precise, handles hundreds of items

Common Retrieval Approaches

1. Embedding-Based Retrieval

The most popular modern approach. Users and items are mapped to dense vectors (embeddings) in the same space, and retrieval becomes finding the nearest neighbours.

How it works:

  1. Train an embedding model that maps users and items to vectors
  2. Store all item embeddings in an approximate nearest neighbours (ANN) index
  3. At request time, compute the user embedding and find the closest item vectors
# Simplified embedding retrieval
user_embedding = model.encode_user(user_id)
candidates = ann_index.search(user_embedding, top_k=1000)

Examples in industry:

  • YouTube uses deep neural networks to generate user and video embeddings
  • Facebook uses embedding models for friend suggestions and content recommendations

2. Item-to-Item Collaborative Filtering

Find items similar to what the user has interacted with. This approach doesn’t require computing a user embedding at request time.

How it works:

  1. Pre-compute similarity scores between all item pairs (offline)
  2. Store the top-K most similar items for each item
  3. At request time, look up items similar to the user’s recent interactions
# For each item the user recently viewed
for item in user_recent_items:
    similar_items = item_similarity_index[item]
    candidates.extend(similar_items)

This is simple and fast because everything is pre-computed. The downside is it can’t adapt to the user’s current context.

3. Graph-Based Retrieval

Build a graph connecting users and items based on interactions, then traverse it to find candidates.

How it works:

  1. Build a bi-directional graph from user-item interactions
  2. Use random walks or graph neural networks to generate embeddings
  3. Retrieve items that are “close” in the graph

Alibaba uses graph-based approaches for e-commerce recommendations, connecting items that are frequently browsed together.

4. Two-Tower Models

A popular architecture where user features and item features are encoded separately into embeddings, then compared via dot product.

User Features → User Tower → User Embedding

                              Dot Product → Score

Item Features → Item Tower → Item Embedding

The separation allows item embeddings to be pre-computed and indexed, while user embeddings are computed at request time.

Retrieval Metrics

Unlike ranking, retrieval prioritises recall over precision:

  • Recall: Of all relevant items, how many did we retrieve?
  • The goal is to not miss good items (false negatives are costly)
  • It’s okay to include some bad items (the ranker will filter them out)

A typical target might be 50% recall at 1000 candidates — meaning half of the items the user would like are in the candidate set.

Offline vs Online Retrieval

AspectOfflineOnline
WhenBatch jobs (daily/weekly)Real-time per request
ExamplePre-compute top-1000 items per userCompute user embedding, query ANN index
ProsNo latency cost at servingAdapts to current context
ConsStale, can’t personalise to momentRequires fast infrastructure

Many systems use a hybrid: pre-compute candidates for popular users offline, generate dynamically for others.

See Also

-
-