Candidate Retrieval
The first stage of a recommender system that quickly narrows millions of items down to a smaller set for ranking.
Candidate retrieval (also called candidate generation) is the first stage in a two-stage recommender system. Its job is to quickly filter millions of items down to a manageable set (typically hundreds or thousands) that a more expensive ranking model can score.
Why Separate Retrieval from Ranking?
Scoring every item in a large catalog with a complex model is too slow for real-time requests. Retrieval acts as a fast, coarse filter:
- Retrieval: Fast, less precise, handles millions of items
- Ranking: Slower, more precise, handles hundreds of items
Common Retrieval Approaches
1. Embedding-Based Retrieval
The most popular modern approach. Users and items are mapped to dense vectors (embeddings) in the same space, and retrieval becomes finding the nearest neighbours.
How it works:
- Train an embedding model that maps users and items to vectors
- Store all item embeddings in an approximate nearest neighbours (ANN) index
- At request time, compute the user embedding and find the closest item vectors
# Simplified embedding retrieval
user_embedding = model.encode_user(user_id)
candidates = ann_index.search(user_embedding, top_k=1000)
Examples in industry:
- YouTube uses deep neural networks to generate user and video embeddings
- Facebook uses embedding models for friend suggestions and content recommendations
2. Item-to-Item Collaborative Filtering
Find items similar to what the user has interacted with. This approach doesn’t require computing a user embedding at request time.
How it works:
- Pre-compute similarity scores between all item pairs (offline)
- Store the top-K most similar items for each item
- At request time, look up items similar to the user’s recent interactions
# For each item the user recently viewed
for item in user_recent_items:
similar_items = item_similarity_index[item]
candidates.extend(similar_items)
This is simple and fast because everything is pre-computed. The downside is it can’t adapt to the user’s current context.
3. Graph-Based Retrieval
Build a graph connecting users and items based on interactions, then traverse it to find candidates.
How it works:
- Build a bi-directional graph from user-item interactions
- Use random walks or graph neural networks to generate embeddings
- Retrieve items that are “close” in the graph
Alibaba uses graph-based approaches for e-commerce recommendations, connecting items that are frequently browsed together.
4. Two-Tower Models
A popular architecture where user features and item features are encoded separately into embeddings, then compared via dot product.
User Features → User Tower → User Embedding
↓
Dot Product → Score
↑
Item Features → Item Tower → Item Embedding
The separation allows item embeddings to be pre-computed and indexed, while user embeddings are computed at request time.
Retrieval Metrics
Unlike ranking, retrieval prioritises recall over precision:
- Recall: Of all relevant items, how many did we retrieve?
- The goal is to not miss good items (false negatives are costly)
- It’s okay to include some bad items (the ranker will filter them out)
A typical target might be 50% recall at 1000 candidates — meaning half of the items the user would like are in the candidate set.
Offline vs Online Retrieval
| Aspect | Offline | Online |
|---|---|---|
| When | Batch jobs (daily/weekly) | Real-time per request |
| Example | Pre-compute top-1000 items per user | Compute user embedding, query ANN index |
| Pros | No latency cost at serving | Adapts to current context |
| Cons | Stale, can’t personalise to moment | Requires fast infrastructure |
Many systems use a hybrid: pre-compute candidates for popular users offline, generate dynamically for others.