[ OK ] 515 — full content available
[ INFO ] category: System Design difficulty: hard freq: high first seen: 2026-01-13
[HARD][SYSTEM DESIGN][HIGH]Schedulingdata_engineeringDistributed SystemsState Machinewebmachine_learningmobilebackendReliabilityinfrastructure
$ cat problem.md
This is Robinhood's signature system design question!
Design a distributed job scheduler system.
Requirements:
- Create jobs that run on a schedule (cron-like)
- Reliably run each task at the designated time
- Handle scenarios where job didn't finish on time
- Query for past job status
- Query for past job logs
Key Components to Discuss:
- Scheduler Service - Manages job timing, triggers execution
- Worker Pool - Executes jobs, reports status
- Job Queue - Holds pending jobs (Kafka, SQS)
- Database - Stores job definitions, history, logs
- Monitoring - Track job health, alerting
Discussion Points:
- How to handle job failures? (Retry with exponential backoff, dead letter queue)
- How to ensure exactly-once or at-least-once execution?
- How to scale workers horizontally?
- Database schema for jobs and execution history
- How to handle long-running jobs vs timeout?
- Monitoring, alerting, and observability
Note: The Robinhood interview invite often explicitly mentions "Job Scheduler", giving you time to prepare specifically for this topic!