Online Retrieval And Ranking

In the context of recommender systems, particularly those operating at a scale with millions of items, the architecture is typically split into a multi-stage funnel to balance accuracy with low-latency requirements. This process generally distinguishes between retrieval (or candidate generation) and ranking. This two-stage approach is part of a broader hybrid offline-to-online architecture.

Retrieval (Candidate Generation)

The objective of the retrieval stage is to quickly select a small, relevant subset of items (typically dozens or hundreds) from a massive catalog that may contain millions or billions of items.

Function: Because scoring every item in a large catalog with a complex model is computationally too expensive for real-time requests, retrieval acts as a coarse filter.
Techniques: This stage often utilises computationally efficient methods. For example, systems may use non-personalised approaches for new users, such as recommending popular content to solve the “cold start” problem. More advanced retrieval relies on embeddings, where a model generates vector representations for products or users. These vectors are stored in an approximate nearest neighbours (ANN) index for fast lookup. Embedding models may be retrained less frequently (e.g., once a week) compared to ranking models, as item characteristics change slowly.
Output: The result is a reduced list of candidates that are likely to be relevant to the user, passed downstream to the ranking system.

Ranking

The ranking stage takes the candidates provided by the retrieval system and applies a more sophisticated model to order them precisely based on the user’s predicted interest.

Function: This stage focuses on high accuracy. It scores the limited set of retrieved items to determine the exact order in which they should be presented.
Techniques: Ranking models are often deep learning architectures, such as Neural Collaborative Filtering (NCF). In this approach, user and item IDs are mapped to dense embedding vectors, concatenated, and fed through a multi-layer perceptron to output a predicted score, such as a star rating or interaction probability,.
Multi-Objective Optimization: Ranking often involves decoupling objectives. For instance, a newsfeed might use separate models to predict different outcomes—such as one model predicting the likelihood of a click (engagement) and another predicting the quality of the post. The final rank is determined by combining these scores, allowing the system to balance user engagement with content quality,,.

Post-Ranking

Following the ranking stage, the ordered list often goes through a layer of business rules. This step filters out items based on specific criteria, such as removing content the user has explicitly disliked or ensuring the recommendations meet current business segmentation rules, before the final list is sent to the user’s device.

Retrieval (Candidate Generation)

Ranking

Post-Ranking

See Also