User and Item Models
How user and item features are computed offline and combined online to generate recommendations.
Recommender systems model both users and items as collections of features. These features are computed at different times (offline vs online) and combined at serving time to predict relevance.
User Model
The user model captures who the user is and what they’ve done.
Static Features (Slow-Changing)
Computed offline in batch jobs, updated daily or weekly.
| Feature | Example | Update Frequency |
|---|---|---|
| Demographics | Age, gender, location | Rarely changes |
| Account age | Days since signup | Daily |
| Lifetime stats | Total purchases, avg spend | Daily |
| Category preferences | ”Likes electronics 70%, books 20%“ | Weekly |
| Long-term embedding | Dense vector from historical behaviour | Weekly |
| Segment membership | ”Power user”, “Bargain hunter” | Weekly |
# Offline batch job (runs nightly)
user_features = {
"user_id": user.id,
"account_age_days": (today - user.signup_date).days,
"lifetime_purchases": count_purchases(user),
"avg_order_value": mean(user.order_values),
"top_categories": get_top_categories(user, k=5),
"user_embedding": embedding_model.encode(user.history),
}
feature_store.write(user.id, user_features)
Dynamic Features (Fast-Changing)
Computed online in real-time or near real-time.
| Feature | Example | Update Frequency |
|---|---|---|
| Session items | Items viewed this session | Real-time |
| Recent clicks | Last 10 items clicked | Real-time |
| Current context | Time of day, device, location | Per request |
| Short-term intent | ”Browsing winter coats” | Per session |
| Cart contents | Items currently in cart | Real-time |
# Online at request time
session_features = {
"session_items": get_session_views(session_id),
"recent_clicks": get_recent_clicks(user_id, last_n=10),
"current_hour": datetime.now().hour,
"device": request.device_type,
"location": request.geo,
}
Item Model
The item model captures what the item is and how it’s performed.
Static Features (Slow-Changing)
Computed offline, often at item ingestion time.
| Feature | Example | Update Frequency |
|---|---|---|
| Content attributes | Title, description, category | At creation |
| Item embedding | Dense vector from content/interactions | Weekly |
| Price tier | ”Budget”, “Premium” | On price change |
| Image features | Extracted from product photos | At creation |
| Text embeddings | From title/description | At creation |
# Offline batch job
item_features = {
"item_id": item.id,
"category": item.category,
"brand": item.brand,
"price": item.price,
"price_tier": categorize_price(item.price, item.category),
"content_embedding": text_model.encode(item.title + item.description),
"image_embedding": image_model.encode(item.image),
}
feature_store.write(item.id, item_features)
Aggregate Features (Updated Periodically)
Computed from user interactions, updated in batch.
| Feature | Example | Update Frequency |
|---|---|---|
| Popularity | View count, purchase count | Hourly/Daily |
| Avg rating | Mean of user ratings | Daily |
| CTR | Historical click-through rate | Daily |
| Conversion rate | Purchases / views | Daily |
| Co-purchase stats | ”Often bought with X” | Weekly |
# Offline aggregation job
item_stats = {
"view_count_7d": count_views(item, days=7),
"purchase_count_7d": count_purchases(item, days=7),
"avg_rating": mean(item.ratings),
"ctr": item.clicks / item.impressions,
"conversion_rate": item.purchases / item.views,
}
feature_store.update(item.id, item_stats)
Real-Time Features
Updated continuously via streaming.
| Feature | Example | Update Frequency |
|---|---|---|
| Stock status | In stock, low stock, out | Real-time |
| Current price | After discounts | Real-time |
| Trending score | Recent velocity of views | Minutes |
Offline Computation
Most feature computation happens in batch jobs:
┌─────────────────────────────────────────────────────┐
│ OFFLINE │
├─────────────────────────────────────────────────────┤
│ │
│ Data Warehouse │
│ ↓ │
│ Batch Processing (Spark/Airflow) │
│ ↓ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ User Features│ │Item Features│ │
│ └─────────────┘ └─────────────┘ │
│ ↓ ↓ │
│ ┌─────────────────────────────────┐ │
│ │ Feature Store │ │
│ │ (Redis, DynamoDB, Feast) │ │
│ └─────────────────────────────────┘ │
│ │
│ Embedding Model Training │
│ ↓ │
│ ┌─────────────────────────────────┐ │
│ │ ANN Index (FAISS, ScaNN) │ │
│ └─────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘
What happens offline:
- Train embedding models on historical data
- Generate user embeddings from their interaction history
- Generate item embeddings from content and interactions
- Compute aggregate statistics (popularity, CTR, ratings)
- Build ANN index from item embeddings
- Push all features to the Feature Store
Online Interaction
When a request arrives, features are assembled and combined:
┌─────────────────────────────────────────────────────┐
│ ONLINE │
├─────────────────────────────────────────────────────┤
│ │
│ User Request │
│ ↓ │
│ ┌─────────────────────────────────┐ │
│ │ Fetch from Feature Store │ │
│ │ - User static features │ │
│ │ - Item features (for candidates) │ │
│ └─────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────┐ │
│ │ Compute Real-Time Features │ │
│ │ - Session context │ │
│ │ - Current time/location │ │
│ └─────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────┐ │
│ │ Combine All Features │ │
│ │ [user] + [item] + [context] │ │
│ └─────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────┐ │
│ │ Scoring Model │ │
│ │ predict(combined_features) │ │
│ └─────────────────────────────────┘ │
│ ↓ │
│ Ranked Results │
│ │
└─────────────────────────────────────────────────────┘
What happens online:
- Receive user request with user_id and context
- Fetch pre-computed user features from Feature Store
- Run retrieval to get candidate items
- Fetch pre-computed item features for each candidate
- Compute real-time features (session, context)
- Combine user + item + context features into a single vector
- Score each candidate with the ranking model
- Apply ordering logic and return results
Feature Combination Example
def score_candidate(user_id, item_id, context):
# Fetch offline features
user_features = feature_store.get_user(user_id)
item_features = feature_store.get_item(item_id)
# Compute online features
session_features = compute_session_features(context)
# Combine into single feature vector
combined = {
# User features
"user_embedding": user_features["embedding"],
"user_avg_spend": user_features["avg_order_value"],
"user_category_affinity": user_features["top_categories"],
# Item features
"item_embedding": item_features["embedding"],
"item_price": item_features["price"],
"item_popularity": item_features["view_count_7d"],
"item_ctr": item_features["ctr"],
# Context features
"hour_of_day": session_features["current_hour"],
"device": session_features["device"],
"items_viewed_this_session": len(session_features["session_items"]),
# Interaction features
"price_vs_user_avg": item_features["price"] / user_features["avg_order_value"],
"user_item_category_match": category_match_score(user_features, item_features),
}
return ranking_model.predict(combined)
Two-Tower Architecture
A common pattern separates user and item encoding:
User Features ──→ User Tower ──→ User Embedding ─┐
├──→ Dot Product ──→ Score
Item Features ──→ Item Tower ──→ Item Embedding ─┘
Why this helps:
- Item embeddings can be pre-computed offline
- Only user embedding needs computation at request time
- Enables fast ANN lookup for retrieval