Problem Overview

This is an open-ended, hands-on machine learning coding exercise. You will train a binary classifier using scikit-learn on a toy dataset and iteratively improve the model's evaluation metrics.

The interviewer expects you to:

Choose an appropriate toy dataset from scikit-learn
Select and train a binary classifier
Evaluate the model using appropriate metrics
Identify and implement strategies to improve performance
Explain your reasoning at each step

This question tests your practical knowledge of the ML workflow, understanding of evaluation metrics, and ability to iterate on model improvements.

Important Notes

Start with a simple baseline and iteratively improve
Explain your choices for metrics, models, and hyperparameters
Be prepared to discuss trade-offs and production considerations
Write clean, reproducible code with proper train/test splits

Part 1: Dataset Selection and Initial Model

Available Toy Datasets

You can choose from any of these binary-friendly datasets:

from sklearn.datasets import ( load_breast_cancer, # Binary: malignant/benign, 569 samples, 30 features load_iris, # Multi-class, but take 2 classes for binary make_classification, # Synthetic with controllable difficulty make_moons, # Non-linearly separable, good for testing make_circles # Concentric circles, tests non-linear models )

Basic Setup

` from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

Load dataset

data = load_breast_cancer() X, y = data.data, data.target

Split data

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y )

Scale features (important for many classifiers)

scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) `

Use stratified splitting to preserve class distribution
Scale features using StandardScaler (fit on train, transform on test)
Never fit scaler on test data (data leakage)
Set random_state for reproducibility

Common ClassifiersModelStrengthsWhen to UseLogistic RegressionFast, interpretable, good baselineLinear relationshipsRandom ForestHandles non-linearity, feature interactionsMixed features, no scaling neededSVMWorks well in high dimensionsClean data, smaller datasetsGradient BoostingHigh accuracy, handles imbalanceWhen accuracy is critical

` from sklearn.linear_model import LogisticRegression

Train initial model

model = LogisticRegression(random_state=42, max_iter=1000) model.fit(X_train_scaled, y_train)

Evaluate

y_pred = model.predict(X_test_scaled) print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}") print(f"Precision: {precision_score(y_test, y_pred):.4f}") print(f"Recall: {recall_score(y_test, y_pred):.4f}") print(f"F1 Score: {f1_score(y_test, y_pred):.4f}") `

Accuracy: Good for balanced classes, misleading for imbalanced data
Precision: Important when false positives are costly (spam detection)
Recall: Important when false negatives are costly (cancer detection)
F1 Score: Balanced metric when both precision and recall matter
AUC-ROC: Good for ranking and threshold-agnostic evaluation

Problem Overview

This is an open-ended, hands-on machine learning coding exercise. You will train a binary classifier using scikit-learn on a toy dataset and iteratively improve the model's evaluation metrics.

The interviewer expects you to:

Choose an appropriate toy dataset from scikit-learn
Select and train a binary classifier
Evaluate the model using appropriate metrics
Identify and implement strategies to improve performance
Explain your reasoning at each step

This question tests your practical knowledge of the ML workflow, understanding of evaluation metrics, and ability to iterate on model improvements.

Important Notes

Start with a simple baseline and iteratively improve
Explain your choices for metrics, models, and hyperparameters
Be prepared to discuss trade-offs and production considerations
Write clean, reproducible code with proper train/test splits

Part 1: Dataset Selection and Initial Model

Available Toy Datasets

You can choose from any of these binary-friendly datasets:

Basic Setup

Load dataset

data = load_breast_cancer() X, y = data.data, data.target

Split data

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y )

Scale features (important for many classifiers)

scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) `

Use stratified splitting to preserve class distribution
Scale features using StandardScaler (fit on train, transform on test)
Never fit scaler on test data (data leakage)
Set random_state for reproducibility

` from sklearn.linear_model import LogisticRegression

Train initial model

model = LogisticRegression(random_state=42, max_iter=1000) model.fit(X_train_scaled, y_train)

Evaluate

Accuracy: Good for balanced classes, misleading for imbalanced data
Precision: Important when false positives are costly (spam detection)
Recall: Important when false negatives are costly (cancer detection)
F1 Score: Balanced metric when both precision and recall matter
AUC-ROC: Good for ranking and threshold-agnostic evaluation

Coding - Binary Classifier with Model Improvement

Problem Overview

Important Notes

Part 1: Dataset Selection and Initial Model

Available Toy Datasets

Basic Setup

Load dataset

Split data

Scale features (important for many classifiers)

Train initial model

Evaluate

Coding - Binary Classifier with Model Improvement

Problem Overview

Important Notes

Part 1: Dataset Selection and Initial Model

Available Toy Datasets

Basic Setup

Load dataset

Split data

Scale features (important for many classifiers)

Train initial model

Evaluate