Candidate Retrieval

Candidate retrieval (also called candidate generation) is the first stage in a two-stage recommender system. Its job is to quickly filter millions of items down to a manageable set (typically hundreds or thousands) that a more expensive ranking model can score.

Why Separate Retrieval from Ranking?

Scoring every item in a large catalog with a complex model is too slow for real-time requests. Retrieval acts as a fast, coarse filter:

Retrieval: Fast, less precise, handles millions of items
Ranking: Slower, more precise, handles hundreds of items

Common Retrieval Approaches

1. Embedding-Based Retrieval

The most popular modern approach. Users and items are mapped to dense vectors (embeddings) in the same space, and retrieval becomes finding the nearest neighbours.

How it works:

Train an embedding model that maps users and items to vectors
Store all item embeddings in an approximate nearest neighbours (ANN) index
At request time, compute the user embedding and find the closest item vectors

# Simplified embedding retrieval
user_embedding = model.encode_user(user_id)
candidates = ann_index.search(user_embedding, top_k=1000)

Examples in industry:

YouTube uses deep neural networks to generate user and video embeddings
Facebook uses embedding models for friend suggestions and content recommendations

2. Item-to-Item Collaborative Filtering

Find items similar to what the user has interacted with. This approach doesn’t require computing a user embedding at request time.

How it works:

Pre-compute similarity scores between all item pairs (offline)
Store the top-K most similar items for each item
At request time, look up items similar to the user’s recent interactions

# For each item the user recently viewed
for item in user_recent_items:
    similar_items = item_similarity_index[item]
    candidates.extend(similar_items)

This is simple and fast because everything is pre-computed. The downside is it can’t adapt to the user’s current context.

3. Graph-Based Retrieval

Build a graph connecting users and items based on interactions, then traverse it to find candidates.

How it works:

Build a bi-directional graph from user-item interactions
Use random walks or graph neural networks to generate embeddings
Retrieve items that are “close” in the graph

Alibaba uses graph-based approaches for e-commerce recommendations, connecting items that are frequently browsed together.

4. Two-Tower Models

A popular architecture where user features and item features are encoded separately into embeddings, then compared via dot product.

User Features → User Tower → User Embedding
                                    ↓
                              Dot Product → Score
                                    ↑
Item Features → Item Tower → Item Embedding

The separation allows item embeddings to be pre-computed and indexed, while user embeddings are computed at request time.

Retrieval Metrics

Unlike ranking, retrieval prioritises recall over precision:

Recall: Of all relevant items, how many did we retrieve?
The goal is to not miss good items (false negatives are costly)
It’s okay to include some bad items (the ranker will filter them out)

A typical target might be 50% recall at 1000 candidates — meaning half of the items the user would like are in the candidate set.

Offline vs Online Retrieval

Aspect	Offline	Online
When	Batch jobs (daily/weekly)	Real-time per request
Example	Pre-compute top-1000 items per user	Compute user embedding, query ANN index
Pros	No latency cost at serving	Adapts to current context
Cons	Stale, can’t personalise to moment	Requires fast infrastructure

Many systems use a hybrid: pre-compute candidates for popular users offline, generate dynamically for others.

Why Separate Retrieval from Ranking?

Common Retrieval Approaches

1. Embedding-Based Retrieval

2. Item-to-Item Collaborative Filtering

3. Graph-Based Retrieval

4. Two-Tower Models

Retrieval Metrics

Offline vs Online Retrieval

See Also