Find driver_ids who have not taken trip in first 7 days of their signup ( Lot of optimizations were discussed based on data volume )
Find top3 cities each month based on number of trips as criteria
Find total_time spent of each driver each day ( start_time and end_time may span across 2 days → can extend to multiple days )
Find First and Last Position of Element in Sorted Array (Leetcode)
Several questions on spark optimizations
Round 2 ( LLD ) :
Build a system to generate Top 10 movies by category by time frame in streaming platforms like Netflix.
Input Table users_viewership: userid,movieid,date,starttime,endtime
Top 10 criteria: Number of views & Atleast 80 percent of run time should be watched for each view count.
Design an Algo for finding view counts of movies by using users_vieweship table ( Merge Intervals logic )
Design Data model for remaining source tables required and for warehousing tables ( Facts and dimensions )
Design ETL strategy for generating this report daily
Write complete set of sql queries by using tables and generate final table for reporting
Round 3 ( HLD ):
Design a data pipeline Netflix source clickstream events. Build a dashboard with hourly frequency for each location what are top trending movies .Criteria - Number of views per movie as [ View criteria ( Full movie watch ) ]
Round 4 ( HM ):
Generic Behavioural questions
Verdict: Rejected as my LLD round & HM rounds feedback were not strong hires.