Content Recommendation: Click Prediction

[ OK ] 447 — full content available

[ INFO ] category: Coding · Machine Learning difficulty: medium freq: high first seen: 2026-01-13

[MEDIUM][MACHINE LEARNING][HIGH]MLClassificationsklearnJupyter

$ cat problem.md

Content Recommendation: Click Prediction

You are given a dataset from a content platform's recommendation system. Each record represents a user being shown a post, with data about their reading behavior across content categories and whether they clicked.

Your task is to build and evaluate a classification pipeline that predicts whether a user will click on a recommended post.

Objectives

Data Loading - Load the JSON dataset and convert it to a pandas DataFrame
Exploratory Data Analysis - Examine feature distributions, correlations, and the relationship between reading behavior and click outcomes
Feature Engineering - Prepare features for modeling (encode categoricals, create derived features like reading ratios)
Model Training - Train and compare at least 3-4 models:
- Dummy classifier (baseline)
- Logistic Regression
- Random Forest
- Gradient boosting (XGBoost if available, else GradientBoostingClassifier)
Evaluation - Compare models using classification metrics (accuracy, precision, recall, F1, ROC-AUC)

Guidelines

The setup cell loads the data as a pandas DataFrame called df
Use scikit-learn for modeling, preprocessing, and evaluation
Use matplotlib or seaborn for visualizations
Think about why certain features are predictive and what tradeoffs exist between models
If time permits: try cross-validation and hyperparameter tuning

Available Packages

Many ML packages are pre-installed: pandas, numpy, scipy, scikit-learn, matplotlib, seaborn, statsmodels, and more. To install additional packages:

python import micropip await micropip.install('package-name')

user@intervues:~/reddit$