ML System Design - Feature Store

[ OK ] 36d916f7-dd1e-4912-a857-0ca58ce90b7a — full content available

[ INFO ] category: System Design · Ml System Design difficulty: unknown freq: first seen: 2026-05-28

[UNKNOWN][ML SYSTEM DESIGN]

$ cat problem.md

Feature Store

Problem Statement

Design a feature store for Reddit's ML platform. Explain how offline and online feature storage stay consistent, how features are computed and materialized, how training and inference use the same features, and how to handle feature drift.

Constraints

The feature store should support both batch and real-time feature computation.
The feature store should be scalable and able to handle large volumes of data.
The feature store should ensure consistency between offline and online feature storage.
The feature store should support versioning of features to handle feature drift.

Examples

Offline feature storage: Features are computed and stored in a batch process, typically using a data warehouse or a distributed file system.
Online feature storage: Features are computed and served in real-time, typically using a cache or a key-value store.
Consistency between offline and online feature storage: The feature store should ensure that the features used for training and inference are the same, even if they are computed in different environments.
Feature versioning: The feature store should support versioning of features to handle changes in the underlying data or model requirements.

Hints

Consider using a combination of data storage technologies, such as data warehouses, distributed file systems, and in-memory caches, to support both batch and real-time feature computation.
Implement feature versioning by tagging features with version numbers and storing multiple versions of the same feature.
Use a feature catalog to track the lineage and dependencies of features, making it easier to manage and debug the feature store.
Monitor feature drift by regularly comparing the distribution of features in the training and inference datasets.

Solution

To design a feature store for Reddit's ML platform, we can follow these steps:

Data Ingestion and Storage:
- Use a data warehouse or a distributed file system to store raw data and intermediate features computed in batch processes.
- Use an in-memory cache or a key-value store to store features computed in real-time.
Feature Computation and Materialization:
- Define a set of feature computation jobs that run periodically to compute and materialize features in the offline storage.
- Use a streaming processing framework to compute and materialize features in real-time.
Feature Versioning and Catalog:
- Implement feature versioning by tagging features with version numbers and storing multiple versions of the same feature.
- Maintain a feature catalog to track the lineage and dependencies of features, making it easier to manage and debug the feature store.
Consistency Between Offline and Online Feature Storage:
- Ensure that the same feature computation logic is used in both offline and online environments.
- Use a feature serving layer that can fetch features from both offline and online storage, ensuring consistency between the two.
Handling Feature Drift:
- Monitor feature drift by regularly comparing the distribution of features in the training and inference datasets.
- Retrain models when significant feature drift is detected, and update the feature store with new features as needed.

By following these steps, we can design a feature store that supports both batch and real-time feature computation, ensures consistency between offline and online feature storage, and handles feature drift effectively.

user@intervues:~/reddit$