NYC Temperature Data Analysis

[ OK ] 660 — full content available

[ INFO ] category: Coding difficulty: medium freq: 5 first seen: 2026-05-08

[MEDIUM][CODING][5]

$ cat problem.md

You are given a CSV file containing daily temperature readings for New York City and several other towns. Each row contains a date, a town identifier, and the measured high temperature for that day. Some rows are missing temperature values. Your task is to:

Clean the data by filling or removing missing values appropriately.
For each town compute the mean temperature and the standard deviation over the entire period.
Build a single-feature linear-regression model (no intercept) that predicts NYC’s temperature using one other town’s temperature. Choose the town that minimizes the root-mean-square error (RMSE) on the cleaned data.
Output the chosen town’s identifier and the regression coefficient rounded to three decimal places.
(Bonus) Select the 5 towns whose joint linear model (still no intercept) yields the lowest RMSE on the cleaned data; output their identifiers in any order.

You should write a function analyze_temperatures(csv_path) that reads the file at csv_path, performs the above steps, and returns a tuple (best_town, coefficient, five_towns) where five_towns is a list of the five towns selected in the bonus step.

user@intervues:~/two-sigma$