You are given a CSV file containing daily temperature readings for New York City and several surrounding towns over a multi-year period. Each row is one day and columns are: date, nyc_temp, town1_temp, town2_temp, … , townN_temp. Some readings are missing (NaN). Your task is to answer five statistical / prediction questions and then select the best subset of five towns whose combined temperature history can be used to predict NYC temperature with minimal out-of-sample error. Part 1 – Answer exactly five questions: 1) Compute the overall standard deviation of NYC daily temperatures after dropping missing rows. 2) Find the median NYC temperature on days when at least one town’s temperature is above 85 °F. 3) Fit a simple linear regression (single-town predictor) for each town and report the town whose model gives the smallest RMSE on the training data. 4) Fit a multiple linear regression with intercept using exactly two towns and report the pair that yields the smallest training RMSE. 5) Using the model from (4), predict the NYC temperature for the last 30 days of the dataset and report the mean absolute error (MAE); if any required value is missing, drop that day. Part 2 – Greedy forward-selection: Starting from an empty set, iteratively add the town that most reduces 5-fold cross-validation RMSE when appended to the current set; stop when you have exactly five towns; return their names in the order they were chosen.