User and Item Models

Recommender systems model both users and items as collections of features. These features are computed at different times (offline vs online) and combined at serving time to predict relevance.

User Model

The user model captures who the user is and what they’ve done.

Static Features (Slow-Changing)

Computed offline in batch jobs, updated daily or weekly.

Feature	Example	Update Frequency
Demographics	Age, gender, location	Rarely changes
Account age	Days since signup	Daily
Lifetime stats	Total purchases, avg spend	Daily
Category preferences	”Likes electronics 70%, books 20%“	Weekly
Long-term embedding	Dense vector from historical behaviour	Weekly
Segment membership	”Power user”, “Bargain hunter”	Weekly

# Offline batch job (runs nightly)
user_features = {
    "user_id": user.id,
    "account_age_days": (today - user.signup_date).days,
    "lifetime_purchases": count_purchases(user),
    "avg_order_value": mean(user.order_values),
    "top_categories": get_top_categories(user, k=5),
    "user_embedding": embedding_model.encode(user.history),
}
feature_store.write(user.id, user_features)

Dynamic Features (Fast-Changing)

Computed online in real-time or near real-time.

Feature	Example	Update Frequency
Session items	Items viewed this session	Real-time
Recent clicks	Last 10 items clicked	Real-time
Current context	Time of day, device, location	Per request
Short-term intent	”Browsing winter coats”	Per session
Cart contents	Items currently in cart	Real-time

# Online at request time
session_features = {
    "session_items": get_session_views(session_id),
    "recent_clicks": get_recent_clicks(user_id, last_n=10),
    "current_hour": datetime.now().hour,
    "device": request.device_type,
    "location": request.geo,
}

Item Model

The item model captures what the item is and how it’s performed.

Static Features (Slow-Changing)

Computed offline, often at item ingestion time.

Feature	Example	Update Frequency
Content attributes	Title, description, category	At creation
Item embedding	Dense vector from content/interactions	Weekly
Price tier	”Budget”, “Premium”	On price change
Image features	Extracted from product photos	At creation
Text embeddings	From title/description	At creation

# Offline batch job
item_features = {
    "item_id": item.id,
    "category": item.category,
    "brand": item.brand,
    "price": item.price,
    "price_tier": categorize_price(item.price, item.category),
    "content_embedding": text_model.encode(item.title + item.description),
    "image_embedding": image_model.encode(item.image),
}
feature_store.write(item.id, item_features)

Aggregate Features (Updated Periodically)

Computed from user interactions, updated in batch.

Feature	Example	Update Frequency
Popularity	View count, purchase count	Hourly/Daily
Avg rating	Mean of user ratings	Daily
CTR	Historical click-through rate	Daily
Conversion rate	Purchases / views	Daily
Co-purchase stats	”Often bought with X”	Weekly

# Offline aggregation job
item_stats = {
    "view_count_7d": count_views(item, days=7),
    "purchase_count_7d": count_purchases(item, days=7),
    "avg_rating": mean(item.ratings),
    "ctr": item.clicks / item.impressions,
    "conversion_rate": item.purchases / item.views,
}
feature_store.update(item.id, item_stats)

Real-Time Features

Updated continuously via streaming.

Feature	Example	Update Frequency
Stock status	In stock, low stock, out	Real-time
Current price	After discounts	Real-time
Trending score	Recent velocity of views	Minutes

Offline Computation

Most feature computation happens in batch jobs:

┌─────────────────────────────────────────────────────┐
│                    OFFLINE                          │
├─────────────────────────────────────────────────────┤
│                                                     │
│  Data Warehouse                                     │
│       ↓                                             │
│  Batch Processing (Spark/Airflow)                   │
│       ↓                                             │
│  ┌─────────────┐    ┌─────────────┐                │
│  │ User Features│    │Item Features│                │
│  └─────────────┘    └─────────────┘                │
│       ↓                   ↓                         │
│  ┌─────────────────────────────────┐               │
│  │         Feature Store           │               │
│  │   (Redis, DynamoDB, Feast)      │               │
│  └─────────────────────────────────┘               │
│                                                     │
│  Embedding Model Training                           │
│       ↓                                             │
│  ┌─────────────────────────────────┐               │
│  │    ANN Index (FAISS, ScaNN)     │               │
│  └─────────────────────────────────┘               │
│                                                     │
└─────────────────────────────────────────────────────┘

What happens offline:

Train embedding models on historical data
Generate user embeddings from their interaction history
Generate item embeddings from content and interactions
Compute aggregate statistics (popularity, CTR, ratings)
Build ANN index from item embeddings
Push all features to the Feature Store

Online Interaction

When a request arrives, features are assembled and combined:

┌─────────────────────────────────────────────────────┐
│                    ONLINE                           │
├─────────────────────────────────────────────────────┤
│                                                     │
│  User Request                                       │
│       ↓                                             │
│  ┌─────────────────────────────────┐               │
│  │   Fetch from Feature Store      │               │
│  │   - User static features        │               │
│  │   - Item features (for candidates) │            │
│  └─────────────────────────────────┘               │
│       ↓                                             │
│  ┌─────────────────────────────────┐               │
│  │   Compute Real-Time Features    │               │
│  │   - Session context             │               │
│  │   - Current time/location       │               │
│  └─────────────────────────────────┘               │
│       ↓                                             │
│  ┌─────────────────────────────────┐               │
│  │      Combine All Features       │               │
│  │   [user] + [item] + [context]   │               │
│  └─────────────────────────────────┘               │
│       ↓                                             │
│  ┌─────────────────────────────────┐               │
│  │        Scoring Model            │               │
│  │     predict(combined_features)  │               │
│  └─────────────────────────────────┘               │
│       ↓                                             │
│  Ranked Results                                     │
│                                                     │
└─────────────────────────────────────────────────────┘

What happens online:

Receive user request with user_id and context
Fetch pre-computed user features from Feature Store
Run retrieval to get candidate items
Fetch pre-computed item features for each candidate
Compute real-time features (session, context)
Combine user + item + context features into a single vector
Score each candidate with the ranking model
Apply ordering logic and return results

Feature Combination Example

def score_candidate(user_id, item_id, context):
    # Fetch offline features
    user_features = feature_store.get_user(user_id)
    item_features = feature_store.get_item(item_id)

    # Compute online features
    session_features = compute_session_features(context)

    # Combine into single feature vector
    combined = {
        # User features
        "user_embedding": user_features["embedding"],
        "user_avg_spend": user_features["avg_order_value"],
        "user_category_affinity": user_features["top_categories"],

        # Item features
        "item_embedding": item_features["embedding"],
        "item_price": item_features["price"],
        "item_popularity": item_features["view_count_7d"],
        "item_ctr": item_features["ctr"],

        # Context features
        "hour_of_day": session_features["current_hour"],
        "device": session_features["device"],
        "items_viewed_this_session": len(session_features["session_items"]),

        # Interaction features
        "price_vs_user_avg": item_features["price"] / user_features["avg_order_value"],
        "user_item_category_match": category_match_score(user_features, item_features),
    }

    return ranking_model.predict(combined)

Two-Tower Architecture

A common pattern separates user and item encoding:

User Features ──→ User Tower ──→ User Embedding ─┐
                                                  ├──→ Dot Product ──→ Score
Item Features ──→ Item Tower ──→ Item Embedding ─┘

Why this helps:

Item embeddings can be pre-computed offline
Only user embedding needs computation at request time
Enables fast ANN lookup for retrieval

User Model

Static Features (Slow-Changing)

Dynamic Features (Fast-Changing)

Item Model

Static Features (Slow-Changing)

Aggregate Features (Updated Periodically)

Real-Time Features

Offline Computation

Online Interaction

Feature Combination Example

Two-Tower Architecture

See Also