Recommendation Pipeline
The four-stage funnel that transforms millions of items into a personalised, ordered list for the user.
A production recommender system typically processes items through four stages: Retrieval → Filtering → Scoring → Ordering. Each stage narrows and refines the candidates until a final ranked list is presented to the user.
Millions of items
↓
RETRIEVAL → Thousands of candidates
↓
FILTERING → Hundreds of valid candidates
↓
SCORING → Hundreds of scored candidates
↓
ORDERING → Final ranked list (10-50 items)
1. Retrieval
Goal: Quickly narrow millions of items to a manageable set.
The retrieval stage prioritises speed and recall. It doesn’t need to be precise — it just needs to not miss good items.
Techniques:
- ANN search on embeddings
- Item-to-item similarity lookup
- Graph-based retrieval
- Rule-based candidates (e.g., popular items, same category)
Output: ~1,000–10,000 candidates
candidates = ann_index.search(user_embedding, top_k=1000)
candidates += get_popular_items(category, top_k=100)
candidates += get_similar_items(user_recent_views, top_k=500)
2. Filtering
Goal: Remove invalid or irrelevant candidates based on business rules.
This stage applies hard constraints that the model shouldn’t have to learn. Items that fail these rules are removed entirely.
Common filters:
- Already purchased/watched/seen
- Out of stock
- Not available in user’s region
- Explicitly disliked or blocked
- Age-restricted content
- Doesn’t match user’s preferences (e.g., dietary restrictions)
Output: ~100–1,000 valid candidates
filtered = []
for item in candidates:
if item in user_purchase_history:
continue
if not item.in_stock:
continue
if item.region not in user.allowed_regions:
continue
if item in user.blocked_items:
continue
filtered.append(item)
3. Scoring
Goal: Predict how much the user will like each remaining item.
This is where the heavy ML model runs. The scorer takes user features, item features, and context features to predict a relevance score.
Techniques:
- Deep neural networks
- Gradient boosted trees
- Logistic regression
- Multi-objective models (predict click, purchase, watch time, etc.)
Output: Each candidate gets a score (or multiple scores)
scores = []
for item in filtered:
features = combine_features(user, item, context)
score = ranking_model.predict(features)
scores.append((item, score))
4. Ordering
Goal: Arrange items into the final list the user sees.
Ordering isn’t always “sort by score descending.” This stage applies business logic and optimisation objectives.
Considerations:
- Diversity: Don’t show 10 similar items in a row
- Freshness: Mix in some newer content
- Exploration: Occasionally show items the model is uncertain about
- Business rules: Promote sponsored content, new releases
- Position bias: Some slots are more valuable than others
Techniques:
- Greedy re-ranking with diversity constraints
- Maximal Marginal Relevance (MMR)
- Multi-armed bandits for exploration
- Determinantal Point Processes (DPP) for diversity
Output: Final ordered list (10–50 items)
final_list = []
seen_categories = set()
for item, score in sorted(scores, key=lambda x: -x[1]):
# Diversity: limit items per category
if item.category in seen_categories:
if count_category(final_list, item.category) >= 3:
continue
final_list.append(item)
seen_categories.add(item.category)
if len(final_list) >= 20:
break
Why Four Stages?
Each stage has different requirements:
| Stage | Latency Budget | Items Processed | Complexity |
|---|---|---|---|
| Retrieval | ~10ms | Millions → Thousands | Low |
| Filtering | ~5ms | Thousands → Hundreds | Low |
| Scoring | ~50ms | Hundreds | High |
| Ordering | ~5ms | Hundreds → Tens | Medium |
Running a complex neural network on millions of items would be far too slow. The funnel structure lets you apply expensive computation only where it matters.
Multi-Objective Scoring
Often the scoring stage predicts multiple outcomes:
- P(click)
- P(purchase)
- P(long watch time)
- P(share)
The ordering stage then combines these:
final_score = (
0.3 * p_click +
0.5 * p_purchase +
0.2 * p_watch_time
)
This lets the business tune the balance between engagement and revenue without retraining the model.