Image Classifier with Noisy Data

[ OK ] 350 — full content available

[ INFO ] category: Coding · Domain Specific difficulty: hard freq: medium first seen: 2026-01-13

[HARD][DOMAIN SPECIFIC][MEDIUM]Machine LearningComputer VisionData CleaningClassification

$ cat problem.md

You are given a labeled image dataset that is known to contain two kinds of corruption:

Pixel-level noise: a large fraction of images are blurred, JPEG-compressed, or shot under very low light.
Label-level noise: an unknown but non-negligible percentage of the provided labels are simply wrong.

Your task is to design and train an image classifier that reaches the highest possible accuracy on a held-out, manually verified clean test set. You may use any modeling, pre-processing, or training tricks you wish, but you must work with the raw noisy files as training data; you are not allowed to ask for human relabeling. During the onsite interview you will:

a) Write and justify code that detects which images are corrupted and decides whether to denoise them or to keep them as-is. b) Build a preprocessing + training pipeline that is robust to both kinds of noise and explain why it should work. c) Report the final test accuracy and analyze how much of the improvement came from handling pixel noise vs. label noise.

user@intervues:~/openai$