You are given a dataset from a content platform's recommendation system. Each record represents a user being shown a post, with data about their reading behavior across content categories and whether they clicked.
Your task is to build and evaluate a classification pipeline that predicts whether a user will click on a recommended post.
Objectives
Data Loading - Load the JSON dataset and convert it to a pandas DataFrame
Exploratory Data Analysis - Examine feature distributions, correlations, and the relationship between reading behavior and click outcomes
Feature Engineering - Prepare features for modeling (encode categoricals, create derived features like reading ratios)
Model Training - Train and compare at least 3-4 models:
Dummy classifier (baseline)
Logistic Regression
Random Forest
Gradient boosting (XGBoost if available, else GradientBoostingClassifier)