Practice/Apple/Design a Meta posts storage system
Design a Meta posts storage system
System DesignOptional
Problem Statement
Design the storage and retrieval layer for a social media platform's posts — like Facebook's post system. Users create posts with text and media (photos, videos), and others view these posts on profiles, in feeds, and via direct links. The system must handle billions of posts, serve reads with low latency, and enforce privacy settings.
The core challenges are: designing a data model that supports efficient timeline queries (show me this user's last 20 posts), handling large media files without bloating the primary database, enforcing visibility rules on every read, and scaling writes during peak activity (live events, holidays).
Key Requirements
Functional
- Create posts -- users publish posts with text and optional media; posts appear quickly after creation
- Timeline retrieval -- fetch a user's posts in reverse chronological order with pagination
- Permalink access -- fetch any single post by its ID for direct sharing
- Privacy enforcement -- posts have visibility settings (public, friends-only, private) enforced on every read
Non-Functional
- Scalability -- store billions of posts with thousands of writes and millions of reads per second
- Latency -- timeline queries return in under 200ms; permalink lookups in under 50ms
- Durability -- published posts must never be lost
- Cost efficiency -- media storage must be separate from metadata to control costs
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Data Model and Partitioning Strategy
How you structure and partition the data determines whether timeline queries are fast and writes scale horizontally.
Hints to consider:
- Partition posts by user_id so all of a user's posts live on the same shard — this makes timeline queries a single-shard scan
- Use a composite key (user_id, timestamp) with reverse chronological sort so the most recent posts are read first
- Wide-column stores like Cassandra are well-suited: each row is a user, columns are time-sorted posts
- Discuss the hot-user problem: celebrity accounts with millions of posts may need sub-partitioning by time range
2. Media Handling
Photos and videos are orders of magnitude larger than post metadata. Storing them in the primary database is expensive and degrades query performance.
Hints to consider:
- Store media in object storage (S3) and keep only a reference (URL/key) in the post record
- Upload media asynchronously: accept the post immediately, process media (resize, transcode) in the background
- Serve media through a CDN for low-latency global access and reduced origin load
- Discuss how to handle upload failures: the post might reference media that is still processing
3. Privacy and Visibility Enforcement
Every read must check whether the requesting user is allowed to see the post. This check must be fast and correct.
Hints to consider:
- Store the visibility setting with each post (public, friends_only, only_me, custom list)
- For timeline queries, filter posts by visibility after retrieval (read-time filtering)
- Cache the friendship graph for the requesting user to speed up "friends_only" checks
- Discuss edge cases: what happens when a user changes a post from public to friends-only? Cached/indexed copies must be updated