Problem Statement
Design a system that tracks overall public sentiment toward Netflix on social media over time. The system should ingest social media posts, classify sentiment, aggregate results, and surface trends — enabling stakeholders to monitor how public perception shifts in response to content releases, pricing changes, or other events.
This is an ML system design problem that combines data engineering, natural language processing, and analytics infrastructure. You should be prepared to discuss data ingestion pipelines, sentiment classification models, aggregation strategies, and how to visualize trends for business stakeholders.
Key Requirements
Functional
- Social media data ingestion -- collect posts from Twitter, Reddit, Instagram, and other platforms that mention Netflix or related keywords
- Sentiment classification -- apply an ML model to classify each post as positive, negative, or neutral
- Aggregation and trending -- aggregate sentiment scores over time windows (hourly, daily, weekly) to detect shifts
- Event correlation -- enable stakeholders to correlate sentiment changes with specific events (new show releases, pricing announcements)
- Dashboard and alerting -- surface aggregated sentiment metrics and alert when sentiment drops significantly
Non-Functional
- High throughput -- process millions of social media posts per day across multiple platforms
- Near real-time processing -- sentiment scores should be updated frequently enough to catch rapid shifts
- Scalability -- the ingestion and processing pipeline should scale horizontally as data volume grows
- Model accuracy -- sentiment classification should be accurate enough to provide actionable insights
- Data retention -- store historical sentiment data for trend analysis over months or years
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Data Ingestion Pipeline
Interviewers want to see how you collect data from multiple social media platforms and handle the volume and variety of posts.
Hints to consider:
- Use streaming APIs (Twitter API, Reddit API) and batch ingestion for platforms without streaming support
- Design a message queue (Kafka) to buffer incoming posts before processing
- Handle rate limits and API quotas from social media platforms
- Filter posts to those relevant to Netflix using keyword matching or entity recognition
- Store raw posts for reprocessing if the sentiment model is updated
2. Sentiment Classification Model
Interviewers probe on your ML model choice and how you train and deploy it.
Hints to consider:
- Pre-trained models (BERT, RoBERTa) fine-tuned on social media sentiment datasets
- Simpler baseline models (logistic regression on TF-IDF features) for comparison
- Handling emoji, slang, and informal language common in social media
- Multi-class classification (positive, negative, neutral) or regression (sentiment score)
- Consider domain-specific vocabulary (show names, character names, Netflix-specific terms)
3. Aggregation Strategy
Interviewers want to see how you aggregate individual sentiment scores into meaningful trends.
Hints to consider:
- Time-bucketed aggregation (hourly, daily) with rolling averages to smooth noise
- Weight posts by engagement (likes, retweets, comments) rather than treating all posts equally
- Segment by platform (Twitter vs Reddit may have different sentiment baselines)
- Detect anomalies or significant drops in sentiment scores
- Store aggregated metrics in a time-series database (InfluxDB, TimescaleDB)
4. Offline ML Training Pipeline
Interviewers expect you to discuss how the sentiment model is trained and retrained over time.
Hints to consider:
- Labeled training data from public sentiment datasets or manual annotation
- Periodic retraining to adapt to evolving language and new shows/events
- Evaluation metrics (accuracy, F1 score per class) with special attention to class imbalance
- A/B testing framework to compare model versions before full deployment
- Batch processing pipeline (Spark or Airflow) for training data preparation
5. Visualization and Alerting
Interviewers want to see how you surface insights to stakeholders.
Hints to consider:
- Dashboard showing sentiment trend lines over time with drill-down by platform or keyword
- Heatmaps or geographic breakdowns if location data is available
- Alerting system that triggers when sentiment drops below a threshold or changes rapidly
- Correlation view that overlays sentiment trends with event timelines (show releases, announcements)
Suggested Approach
Step 1: Clarify Requirements
Ask about the scale (posts per day), latency requirements (real-time vs batch), which social media platforms to prioritize, and whether the interviewer wants to focus on the ML model or the data pipeline.
Step 2: High-Level Architecture
Sketch these core components:
- Data Ingestion Layer -- API connectors for social media platforms feeding into a message queue (Kafka)
- Stream Processing -- filters and preprocesses posts (deduplication, keyword filtering)
- Sentiment Classification Service -- stateless service that loads the ML model and returns sentiment scores
- Aggregation Layer -- stream processor (Flink or Spark Streaming) that computes time-bucketed aggregates
- Storage -- raw post storage (S3 or data lake), aggregated metrics in time-series DB
- Offline Training Pipeline -- batch pipeline for model training and evaluation
- Dashboard and Alerting -- visualization layer (Grafana or custom web app) with alerting rules
Step 3: Deep Dive on Sentiment Model
Discuss your model choice (transformer-based vs simpler models), training data sources, evaluation metrics, and how you handle social media-specific challenges (sarcasm, emojis, slang).
Step 4: Deep Dive on Aggregation and Trending
Explain how you compute aggregated sentiment over time windows, how you weight posts, and how you detect significant sentiment shifts.
Step 5: Monitoring and Iteration
Discuss how you monitor model accuracy in production, collect feedback for retraining, and iterate on the system as new platforms or data sources are added.
Real Interview Quotes
"Design a system that can track social media sentiment about Netflix over time. I structured my answer around three pillars: data ingestion from social media sources, aggregation of sentiment signals, and offline ML training for the sentiment classification model."