Hybrid Offline To Online Systems
In a hybrid recommender system architecture, the offline and online components do not operate in isolation; rather, they form a cycle where the offline system provides the foundational intelligence (models and complex features) that enables the online system to deliver low-latency, personalised responses.
A key insight from industry practitioners: most production systems are not purely real-time. Even systems that appear real-time often rely heavily on pre-computed artifacts from offline batch processing.
Here is an overview of how data flows from offline systems into online systems within a hybrid approach.
1. The Offline System: The Source of Intelligence
The offline environment is utilized for heavy lifting—processing massive historical datasets to generate artifacts that the online system can use for real-time decision-making.
- Model Training and Publishing: The offline system is responsible for reading training data, validating it, and training the model. Because training complex models (like deep neural networks) is computationally intensive, it occurs asynchronously. Once a model is trained and evaluated, the pipeline “publishes” this model to the online system, where it can serve actual traffic. This cycle repeats periodically; as new user data is logged, the offline system retrains the model to capture changing preferences and items.
- Batch Feature Computation: Some features, such as a user’s average spend over the last year or a driver’s long-term rating, change slowly and are expensive to compute. These are known as batch features. The offline system calculates these features periodically and stores them in a low-latency database (often part of a Feature Store). This allows the online system to simply fetch these values rather than computing them from scratch.
- Embedding Generation: For candidate retrieval, the offline system trains embedding models and generates vectors for all items. These embeddings are then loaded into an approximate nearest neighbours (ANN) index that the online system queries.
- Item-to-Item Similarities: Pre-computing which items are similar to each other (e.g., “users who bought X also bought Y”) is done offline and stored for fast lookup.
2. The Hand-Off: Feeding Data to the Online System
The transition from offline to online involves moving these pre-computed assets into an environment where they can be accessed in milliseconds.
- The Model Store: The trained model (including its definition and parameters) is exported to a Model Store. This acts as the bridge, ensuring the online service loads the correct, versioned model for inference.
- Feature Stores: To prevent “training-serving skew”—where features calculated offline differ from those used online—hybrid systems often use a Feature Store. The offline system computes complex features and pushes them to the store. When a user request arrives, the online system retrieves these pre-computed features and combines them with real-time data,.
- Pre-computed Predictions (Caching): A hybrid approach may also involve pre-computing recommendations for a subset of users. For example, a system might pre-compute predictions for popular queries (the “head”) offline and store them for instant retrieval, while relying on real-time generation for less common queries (the “tail”).
3. The Online System: Real-Time Execution
The online system generates predictions synchronously when a request arrives (e.g., a user opens an app). It relies on the assets provided by the offline system to achieve low latency.
- Retrieval and Ranking: In many hybrid architectures, the process is split into two stages. The retrieval stage (often powered by offline-trained embeddings or rules) quickly selects a candidate set of items (e.g., hundreds to thousands) from a catalog of millions. The ranking stage then uses a sophisticated model (trained offline) to score and order these items precisely based on the user’s immediate context.
- Combining Features: To make a prediction, the online system pulls the batch features (provided by the offline system) and combines them with streaming features (dynamic features computed in real-time, such as the user’s current location or session click history),. This combination allows the model to be both historically informed and immediately responsive.
4. The Feedback Loop: Online Feeding Offline
The flow of data is circular. The online system captures user interactions (clicks, views, purchases), which creates a feedback loop.
- Log and Wait: To generate training data for the next offline cycle, systems often use a “log and wait” approach. The online system logs the features it used to make a prediction and then “waits” for the user’s response (the label) to arrive.
- Continuous Improvement: This logged data is fed back into the offline data warehouse, where it becomes the training data for the next iteration of the model,.
By utilising this hybrid structure, engineers can leverage the high throughput of offline batch processing for training and complex feature engineering while maintaining the low latency required for online user experiences.
When Real-Time Isn’t Necessary
Not all recommendations need to be computed in real-time. Consider the use case:
| Use Case | Latency Requirement | Approach |
|---|---|---|
| Weekly groceries | Hours/days stale is fine | Batch pre-compute |
| News feed | Minutes matter | Near real-time |
| Search results | Seconds matter | Real-time |
| ”Because you watched X” | Days stale is fine | Batch pre-compute |
A user checking their grocery list once a week doesn’t need real-time recommendations. Pre-computing their likely items in a batch job is more cost-effective. But a user actively searching needs immediate results.
See Also
- User and Item Models — Features computed offline and combined online
- Online - Retrieval and ranking
- Candidate Retrieval
- Approximate Nearest Neighbours