Practice/Apple/Design a Meta post storage and query system

Design a Meta post storage and query system

System DesignOptional

Problem Statement

Design a system that stores social media posts with associated tags and keywords, and lets users search for posts using Boolean operations on those tags. For example, a user might search for posts tagged with both #sports AND #basketball, or posts tagged with either #cooking OR #baking. Think of Instagram hashtag search or Facebook keyword filtering, but with support for combining tags.

The challenge is building and scaling an inverted index that maps tags to posts, executing set operations (union, intersection) efficiently over potentially millions of results, and keeping the index consistent with the primary data store as posts are created, updated, and deleted.

Key Requirements

Functional

Post creation with tags -- users create posts with text/media and attach one or more tags or keywords
Boolean tag queries -- users search posts by combining tags with AND (intersection) and OR (union) operations
Pagination -- results are paginated with stable ordering (cursor-based) sorted by recency
Tag updates -- users can add/remove tags from existing posts; changes are reflected in search results promptly

Non-Functional

Scalability -- support billions of posts with millions of unique tags and thousands of queries per second
Latency -- tag queries return in under 500ms at P95, even for popular tags with millions of matching posts
Consistency -- new posts appear in tag search results within seconds (eventual consistency is acceptable)
Availability -- search remains available even if the indexing pipeline falls behind

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Inverted Index Design

The inverted index maps each tag to a sorted list of post IDs (a posting list). Efficient Boolean operations on these lists are the foundation of the system.

Hints to consider:

For each tag, maintain a posting list sorted by timestamp (descending) for recency-based results
AND (intersection): merge two sorted posting lists keeping only IDs present in both — use two pointers for O(n+m)
OR (union): merge two sorted lists keeping all IDs — also two pointers
Compress posting lists using delta encoding and variable-byte encoding to save storage and improve cache performance

2. Hot Tag Problem

Some tags are extremely popular (#love has billions of posts). Queries involving hot tags create massive read and compute loads.

Hints to consider:

Shard posting lists by time range: instead of one giant list for #love, have one per day/week
For queries, only scan recent time ranges first and stop once enough results fill the page
Cache the first page of results for popular tag combinations with short TTLs
Consider pre-computing results for the most common tag queries (top 1000 combinations)

3. Keeping the Index Consistent

When posts are created, deleted, or re-tagged, the inverted index must be updated. Doing this synchronously on the write path adds latency and couples systems.

Hints to consider:

Use an async indexing pipeline: write the post to the primary store, publish an event to Kafka, and a consumer updates the index
Make index updates idempotent using post version numbers — if the index already has a newer version, skip the update
Handle deletions with tombstones: mark a post as deleted in the index rather than removing it immediately, clean up in background
Discuss how to backfill or rebuild the index if it gets corrupted or a new tag field is added

Practice/Apple/Design a Meta post storage and query system

Design a Meta post storage and query system

System DesignOptional

Problem Statement

Key Requirements

Functional

Post creation with tags -- users create posts with text/media and attach one or more tags or keywords
Boolean tag queries -- users search posts by combining tags with AND (intersection) and OR (union) operations
Pagination -- results are paginated with stable ordering (cursor-based) sorted by recency
Tag updates -- users can add/remove tags from existing posts; changes are reflected in search results promptly

Non-Functional

Scalability -- support billions of posts with millions of unique tags and thousands of queries per second
Latency -- tag queries return in under 500ms at P95, even for popular tags with millions of matching posts
Consistency -- new posts appear in tag search results within seconds (eventual consistency is acceptable)
Availability -- search remains available even if the indexing pipeline falls behind

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Inverted Index Design

The inverted index maps each tag to a sorted list of post IDs (a posting list). Efficient Boolean operations on these lists are the foundation of the system.

Hints to consider:

For each tag, maintain a posting list sorted by timestamp (descending) for recency-based results
AND (intersection): merge two sorted posting lists keeping only IDs present in both — use two pointers for O(n+m)
OR (union): merge two sorted lists keeping all IDs — also two pointers
Compress posting lists using delta encoding and variable-byte encoding to save storage and improve cache performance

2. Hot Tag Problem

Some tags are extremely popular (#love has billions of posts). Queries involving hot tags create massive read and compute loads.

Hints to consider:

Shard posting lists by time range: instead of one giant list for #love, have one per day/week
For queries, only scan recent time ranges first and stop once enough results fill the page
Cache the first page of results for popular tag combinations with short TTLs
Consider pre-computing results for the most common tag queries (top 1000 combinations)

3. Keeping the Index Consistent

When posts are created, deleted, or re-tagged, the inverted index must be updated. Doing this synchronously on the write path adds latency and couples systems.

Hints to consider:

Use an async indexing pipeline: write the post to the primary store, publish an event to Kafka, and a consumer updates the index
Make index updates idempotent using post version numbers — if the index already has a newer version, skip the update
Handle deletions with tombstones: mark a post as deleted in the index rather than removing it immediately, clean up in background
Discuss how to backfill or rebuild the index if it gets corrupted or a new tag field is added